Bayesian genetic association analysis of all rare diseases in the Kids First cohort
Project Number1R03HD111492-01
Contact PI/Project LeaderTURRO, ERNEST
Awardee OrganizationICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
Description
Abstract Text
Rare diseases affect 1 in 20 people, but fewer than half of the ⇠10,000 catalogued rare diseases have a re-
solved genetic etiology. Genetic association analyses of whole-genome sequencing (WGS) data from large,
phenotypically diverse collections of rare disease patients enhance the discovery of novel etiologies, compared
to within-study analyses, by increasing the probability of multiple cases sharing a genetic etiology and by boost-
ing the number of controls (Turro et al., Nature 2020). The Gabriella Miller Kids First (KF) program has germline
WGS data from 20 studies on 18,547 probands or relatives of probands with a birth defect or pediatric cancer.
However, due to the bioinformatic and statistical challenges of analyzing such large and complex WGS datasets,
a comprehensive cross-cutting genetic association analysis has never been performed. We present a research
program of computational and statistical approaches to uncover novel germline etiologies of rare diseases in KF
and replicate them in other cohorts to which we have access. In Aim 1, we will build a compact and portable
relational database containing a sparse representation of all the rare variant genotypes in the KF WGS data. Due
to natural selection, almost all pathogenic variants responsible for rare congenital or hereditary disorders are rare
and will thus be included. We will annotate the variants with scores reflecting their predicted deleteriousness and
their minor allele frequencies, and with their predicted molecular consequences. We will load sample-specific
information into the database, including pedigree membership, membership of a maximal set of unrelated partic-
ipants (MSUP) and group memberships for case/control association analyses. In Aim 2, we will develop a web
application allowing authenticated users to browse variants by gene or sample. The web interface will allow users
to click on sample IDs directly in a table of genotypes to view the phenotypes of individuals who are heterozy-
gous, homozygous or compound heterozygous for a given consequence class of rare variants in a side panel.
The application will also host and display the results of inference, such as posterior probabilities of association
(PPAs), posterior probabilities over the mode of inheritance, posterior probabilities over the consequence class of
pathogenic variants and posterior probabilities of the pathogenicities of variants. The application will be accessi-
ble by authorized collaborating experts across disciplines. In Aim 3, we will obtain a PPA between each gene and
each of a collection of case sets in KF in accordance with each study's data restrictions, if any. We will determine
the case sets using Mondo Disease Ontology and Human Phenotype Ontology terms assigned to cases. We will
select probands in a given case set using pedigree information and compare them to participants not in the case
set who are in other pedigrees and in the MSUP. We will attempt to replicate findings with a PPA >0.95 in our
previously deployed databases encompassing >100,000 individuals and using GeneMatcher. The deployment of
powerful, lightweight and portable analytical frameworks across different patient collections, promises to advance
etiological discovery and replication of the remaining unknown causes of congenital disorders.
Public Health Relevance Statement
The genetic analysis of rare diseases across multiple studies has the potential to reveal novel causes of disease
that might be missed in isolated study-specific analyses. Although the Gabriella Miller Kids First (KF) cohort
has genetic and clinical data on almost 20,000 people who have, or who are related to someone who has, a
birth defect or pediatric cancer, the cohort has not been analyzed jointly. We will develop and deploy a set of
computationally efficient and statistically powerful methods for discovering novel causes of rare diseases in KF
and validate our results in other rare disease cohorts to which we have access.
Biotechnology; Genetics; Human Genome; Pediatric; Rare Diseases
Sub Projects
No Sub Projects information available for 1R03HD111492-01
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 1R03HD111492-01
Patents
No Patents information available for 1R03HD111492-01
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 1R03HD111492-01
Clinical Studies
No Clinical Studies information available for 1R03HD111492-01
News and More
Related News Releases
No news release information available for 1R03HD111492-01
History
No Historical information available for 1R03HD111492-01
Similar Projects
No Similar Projects information available for 1R03HD111492-01