Computational tools for accurate inference of genetic ancestry from cancer-derived molecular data
Project Number1U01CA289357-01
Contact PI/Project LeaderKRASNITZ, ALEXANDER
Awardee OrganizationCOLD SPRING HARBOR LABORATORY
Description
Abstract Text
Project summary/abstract
For multiple cancer types, epidemiological data exhibit strong correlations between, on the one
hand, the incidence of the disease, its severity when diagnosed, and its clinical outcome, and, on
the other hand, the ancestral background of the patient. This well-documented phenomenon
strongly suggests a link between the biology and genetics of cancer in an individual and the
individual's genetic ancestry. Indeed, recent research in cancer genomics, both pan-cancer and
cancer type-specific, points to genetic and phenotypic differences between tumors occurring in
patient populations with differing genetic ancestries, and to the need for more data collection to
power further study in this area. It is the purpose of this proposal to facilitate such data analysis
on a much greater scale, by enabling genetic ancestry inference directly from cancer-derived
molecular data, without the need for the patient's cancer-free genotype or self-declared race or
ethnicity. Successful completion of this project will unlock vast amounts of such data for ancestry-
oriented studies of cancer from two major sources. One is the body of data stored by the
Sequence Read Archive and similar massive digital repositories, on the order of 106 cancer-
derived molecular profiles. The other is the body of archival tumor tissues across multiple medical
centers, from millions of which molecular data may be generated.
We will develop software tools for genetic ancestry inference from multiple types of
cancer-derived data, namely, DNA sequence data from whole exomes, whole genomes at low
coverage and targeted sequence panels; RNA sequence data; ATAC-seq and bisulfite-converted
sequence data. The tools to be developed will deliver inference of global genetic ancestry at a
sub-continental level of resolution, of ancestral admixtures and of local ancestry. These tools will
be adaptive, endowed with the ability to optimize their performance for each input cancer-derived
molecular profile. This adaptability will be achieved using simulated data, combining the input
cancer-derived profile with ancestral backgrounds representing well-defined population groups.
As a result, these inference methods will perform consistently, and with quantifiable accuracy,
across a range of profiling depths and qualities, and mitigate cancer-related damage to the
genome. An open-source, user-friendly and FAIR-compliant software implementation of these
methods will be made available to the research community through a number of channels,
including GitHub, Bioconductor and Galaxy. Training and community outreach for this software
will be provided in collaboration with ITCR Training Network.
Public Health Relevance Statement
Project narrative
Mounting evidence from recent research, encompassing a variety of cancer types, points to links between
biology of the disease and the patient's ancestral genetic background. It is the goal of this project to facilitate
large-scale data-driven investigation in this important area via methods to learn a patient's genetic ancestry from
tumor-derived molecular data, without the need to genotype the patient's cancer-free DNA or to know the
patient's ethnic self-identification. Anticipated successful completion of the project will make hundreds of
thousands of DNA and RNA profiles of tumors, from genomic data repositories, and millions of tumor specimens
in biobanks, most with no matching normal genotype and no racial/ethnic annotation, available for ancestry-
oriented cancer research.
No Sub Projects information available for 1U01CA289357-01
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 1U01CA289357-01
Patents
No Patents information available for 1U01CA289357-01
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 1U01CA289357-01
Clinical Studies
No Clinical Studies information available for 1U01CA289357-01
News and More
Related News Releases
No news release information available for 1U01CA289357-01
History
No Historical information available for 1U01CA289357-01
Similar Projects
No Similar Projects information available for 1U01CA289357-01