Awardee OrganizationUNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
Description
Abstract Text
Project Summary
The widespread adoption of Electronic Health Records (EHRs) has enabled the use of clinical data for clinical
research and healthcare delivery. Many institutions have established clinical data warehouses (CDWs) in
conjunction with cohort discovery tools (e.g., i2b2) to support the use of clinical data for clinical research
including retrospective clinical studies as well as feasibility assessment or patient recruitment for clinical trials.
However, a significant portion of relevant patient information is embedded in clinical narratives and natural
language processing (NLP) techniques such as information extraction are critical when using EHR data for
clinical research. Many clinical NLP systems have been developed to extract information from text for various
downstream applications but have had unsatisfactory performance and portability issues. Information retrieval
(IR), a technique used in search engines for storing, retrieving, and ranking documents from a large collection of
text documents based on users’ queries, can provide an alternative approach to leverage clinical narratives for
cohort discovery as it is less dependent on semantics. In order to accomplish this, additional work is needed
since current IR approaches are generally document-based and the formulation of cohort discovery as an IR
task requires the development of innovative IR approaches to handle complex EHR data and cohort criteria with
contextual (e.g., spatial or temporal) constraints.
Our long-term goal is to develop informatics solutions to accelerate the use of EHR data for clinical research.
The main goal of this proposal is to develop innovative IR methods, which formulate cohort discovery from EHR
data as an IR task, aiming to accelerate the identification of patient cohorts for cohort studies or the recruitment
of eligible patients for clinical trials. In our current R01-supported study (R01LM011934), we introduced novel
language models to enable the reuse of NLP-produced artifacts for IR-based cohort retrieval and developed
parallel resources for IR evaluation at two institutions (Mayo Clinic and OHSU). We hypothesize that, given
complex cohort criteria with contextual constraints, an IR framework with tailored architecture components (e.g.,
indexing, ranking, evaluation, and query processing) for storing and querying EHR data has an advantage over
traditional cohort discovery tools for querying unstructured EHR data as well as an advantage over text-based
search engines for querying both structured and unstructured EHR data. For the proposed renewal, we plan to
i) adopt common data models (CDMs) and deploy the framework at one additional site to assess the
generalizability of methods, ii) extend the IR framework to incorporate contextual information, and iii)
incorporate deep semantic representations into the IR framework. If successful, the proposed project will
advance informatics research on cohort discovery and identification, which impacts many applications based on
EHR data such as learning healthcare systems, predictive modeling, or AI in healthcare.
Public Health Relevance Statement
Narrative
The widespread adoption of Electronic Health Records (EHRs) has enabled secondary use of EHR data for
clinical and translational research. We propose to advance informatics solutions to enable cohort discovery
using information retrieval and deep representation techniques.
NIH Spending Category
No NIH Spending Category available.
Project Terms
AccelerationAdoptedAdoptionAlgorithmsArchitectureCOVID-19ClinicClinicalClinical DataClinical ResearchClinical TrialsCohort StudiesCollectionCommunitiesComplexDataDevelopmentElectronic Health RecordEligibility DeterminationEvaluationFeedbackFormulationGoalsHealthcareHealthcare SystemsHumanInformaticsInformation RetrievalInstitutionLanguageLearningMedicalMethodsModelingMorphologic artifactsNatural Language ProcessingOutcomePatient RecruitmentsPatientsPerformancePharmaceutical PreparationsProcessResearchResourcesRetrievalSemanticsSiteStructureSystemTechniquesTextTimeTrainingTranslational ResearchVisitWorkclinical data warehouseclinical research siteclinical trial recruitmentcohortdata modelingdata standardsdensitydesignexperimental studyhealth care deliveryheterogenous dataindexinginnovationlearning strategyneuralnovelopen sourceportabilitypredictive modelingquery toolsrecruitsearch enginestructured datatool
No Sub Projects information available for 5R01LM011934-09
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5R01LM011934-09
Patents
No Patents information available for 5R01LM011934-09
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5R01LM011934-09
Clinical Studies
No Clinical Studies information available for 5R01LM011934-09
News and More
Related News Releases
No news release information available for 5R01LM011934-09
History
No Historical information available for 5R01LM011934-09
Similar Projects
No Similar Projects information available for 5R01LM011934-09