Removing batch effects in genomic and epigenomic studies
Project Number7R01GM127430-05
Former Number5R01GM127430-04
Contact PI/Project LeaderJOHNSON, WILLIAM EVAN
Awardee OrganizationRUTGERS BIOMEDICAL AND HEALTH SCIENCES
Description
Abstract Text
Project Summary/Abstract
Combining high-throughput biomedical data sets from multiple studies is advantageous to increase statistical
power in studies where logistical considerations restrict sample size or require the sequential generation of data.
However, significant technical heterogeneity is commonly observed across multiple batches of data that are
generated from different processing or reagent batches, experimenters, protocols, or profiling platforms. These
so-called batch effects confound true relationships in the data, reducing the power benefits of combining multiple
batches of data, and may even lead to spurious results. Many methods have been proposed to filter technical
heterogeneity from genomic data. These methods are designed to remove batch effects, unmeasured or
“surrogate” variation, or other “unwanted” variation caused by biological or technical sources. Although these
approaches represent impactful advances in the field, there are still significant gaps that need to be addressed
to appropriately filter technical heterogeneity from -omics data and other high-throughput datasets. For example,
many existing methods assume relevant covariates are known or that raw data are generally independent. Some
applications require more nuanced correction, including single cell transcriptomics data that are often missing
cell-type identifiers, microbiome and mRNA-seq data that are compositional in nature, and imaging and spatial
transcriptomics data that have spatially correlated data points. Furthermore, batch correction introduces
correlation into the adjusted data, which needs to be accounted for in downstream analyses, and most
researchers performing batch correction are unaware of this negative impact and often incorrectly apply
downstream analysis tools. Finally, there is still significant need for additional software tools and benchmark
datasets for evaluating batch effect methods and their efficacy in specific datasets. We propose to develop
algorithms and software to address these specific research gaps facing researchers combining data from
multiple experimental batches.
Public Health Relevance Statement
Project Narrative
Significant technical heterogeneity and batch effects are commonly observed across multiple batches of data.
We will to develop algorithms and software to address specific gaps facing researchers combining data from
genomic or epigenomic experiments. We will develop algorithms and software for integrating data from single
cell, microbiome, and spatial/imaging data, and benchmarking batch effect methods and their performance.
No Sub Projects information available for 7R01GM127430-05
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 7R01GM127430-05
Patents
No Patents information available for 7R01GM127430-05
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 7R01GM127430-05
Clinical Studies
No Clinical Studies information available for 7R01GM127430-05
News and More
Related News Releases
No news release information available for 7R01GM127430-05
History
No Historical information available for 7R01GM127430-05
Similar Projects
No Similar Projects information available for 7R01GM127430-05