Robust methods for missing data in electronic health records-based studies
Project Number5R01DK128150-04
Contact PI/Project LeaderHANEUSE, SEBASTIEN
Awardee OrganizationHARVARD SCHOOL OF PUBLIC HEALTH
Description
Abstract Text
PROJECT SUMMARY
Electronic health record (EHR) data represent a huge opportunity for cost-efficient clinical and public health
research, especially when a randomized trial or a prospective observational study is not feasible or ethical. EHR
systems, however, are typically developed to support clinical and/or billing activities. As such, substantial care
is needed when using EHR data to address a particular scientific question. In this, an important potential threat
to validity is missing data. Moreover, since EHR data are not collected for any particular research question, it
will often be the case that measurements that are critical to answering the question will be unavailable in the
record of some patients. This, in turn, requires researchers to contend with the potential for selection bias and
compromised generalizability.
Towards addressing issues of missing data in an EHR, researchers could, in principle, appeal to a vast
statistical literature and use standard methods such as multiple imputation (MI), inverse-probability weighting
(IPW) or doubly- robust (DR) estimation. These methods, however, have generally been developed outside of the
EHR context. As such, they typically fail to acknowledge the complexity of the EHR data, in particular the many
decisions made by patients and health care providers that give rise to `complete data' in the EHR, known to as
the data provenance. Because of the disconnect between this complexity and the settings for which most missing
data methods are developed, the application of standard missing data methods to EHR-based studies will often
fail to resolve selection bias and generalizability will remain compromised.
Unfortunately, in contrast to confounding bias, very little attention has been paid to developing methods for
missing data that are specifically tailored to the complexity of EHR-based studies. We will begin to address this
gap by developing, implementing and evaluating a suite of novel, innovative statistical tools including: Aim 1: A
unified framework for robust causal inference in unmatched and matched EHR-based cohort studies with missing
confounder data; Aim 2: A formal, robust framework for causal inference in emulated target trials based on EHR
data; Aim 3: A novel blended analysis framework for missing data in EHR-based studies that combines MI and
IPW in an innovative and unique way; Aim 4: A novel double-sampling strategy for when the EHR data are
suspected to be missing-not-at-random.
The proposed aims are motivated by challenges the investigative team has faced in a series of EHR-based
studies of long-term outcomes among patients who have undergone bariatric surgery. Throughout this research,
we will use data from one of these studies, the DURABLE study, which has rich demographic and longitudinal
clinical information from three Kaiser Permanente health systems on ≈45,000 patients who underwent bariatric
surgery between 1997-2015, as well as on ≈1,636,000 non-surgical enrollees during that time period.
Public Health Relevance Statement
PROJECT NARRATIVE
Although electronic health records (EHR) data represent a huge opportunity for cost-effective clinical and
public health research, care is needed since it typically not collected for that purpose. Among the many challenges
that researchers face in using EHR data, missing data poses a substantial, complex and yet unappreciated threat
to validity. Motivated by our teams experience in conducting research of long-term outcomes following bariatric
surgery, this proposal aims to develop a suite of novel, innovative statistical tools for dealing with missing data in
EHR-based studies.
NIH Spending Category
No NIH Spending Category available.
Project Terms
AddressAttentionCaringClinicalCohort StudiesComplexDataData ProvenanceDecision MakingElectronic Health RecordEligibility DeterminationEthicsFaceHealth PersonnelHealth systemLiteratureLongitudinal StudiesMeasurementMethodologyMethodsObservational StudyOutcomePatientsProbabilityResearchResearch DesignResearch PersonnelSamplingSelection BiasSeriesSpecific qualifier valueStatistical MethodsTechniquesTimebariatric surgerycohortcost effectivecost efficientdesignelectronic health record systemepidemiology studyexperienceflexibilityinnovationnovelopportunity costprospectivepublic health researchrandomized trialsemiparametrictool
National Institute of Diabetes and Digestive and Kidney Diseases
CFDA Code
847
DUNS Number
149617367
UEI
UNVDZNFA8R29
Project Start Date
12-April-2021
Project End Date
31-March-2026
Budget Start Date
01-April-2024
Budget End Date
31-March-2026
Project Funding Information for 2024
Total Funding
$496,299
Direct Costs
$376,224
Indirect Costs
$120,075
Year
Funding IC
FY Total Cost by IC
2024
National Institute of Diabetes and Digestive and Kidney Diseases
$496,299
Year
Funding IC
FY Total Cost by IC
Sub Projects
No Sub Projects information available for 5R01DK128150-04
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5R01DK128150-04
Patents
No Patents information available for 5R01DK128150-04
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5R01DK128150-04
Clinical Studies
No Clinical Studies information available for 5R01DK128150-04
News and More
Related News Releases
No news release information available for 5R01DK128150-04
History
No Historical information available for 5R01DK128150-04
Similar Projects
No Similar Projects information available for 5R01DK128150-04