Enhanced deconvolution and prediction of mutational signatures
Project Number5R21CA226188-02
Contact PI/Project LeaderCAMPBELL, JOSHUA D
Awardee OrganizationBOSTON UNIVERSITY MEDICAL CAMPUS
Description
Abstract Text
The goals of this proposal are to develop novel statistical tools and a software package for performing mutational
signature deconvolution in cancer samples. Mutational signatures are patterns of co-occurring mutations that
can reveal insights into a cancer's etiology and evolution. Currently, non-negative matrix factorization (NMF) is
the “gold-standard” for mutational signature deconvolution. However, NMF has several deficiencies in that it
cannot do the following things: 1) easily characterize patterns within the flanking sequence beyond the
trinucleotide context 2) simultaneously characterize patterns of several genomic features, and 3) predict
mutational signatures of new samples given a previously trained model.
In this proposal, we will develop a novel discrete Bayesian hierarchical model to characterize mutational
signatures in tumor sequencing data that overcomes the limitations of NMF. These types of models are
commonly used in text mining applications to infer topics by examining co-occurring word counts across
documents. Our model will be able to characterize information about the flanking sequence far beyond the
trinucleotide context, incorporate information from other genomic features such as strand or region, and predict
signatures in single samples. Importantly, unlike NMF, the inclusion of extra genomic features in our
clustering algorithm will not result in loss of power for discovery and will aid in prediction of mutational
signatures targeted sequencing data by incorporating additional information.
We will also develop an R/Bioconductor package for data preprocessing, inference, and visualization, which will
streamline mutational signature analysis for researchers. Both NMF and our novel model will be available in the
package so users can compare and contrast the different computational approaches for mutational signature
inference. Interestingly, this package will have the capability to interface with several existing projects from the
Informatics Technology for Cancer Research (ITCR) program. Finally, we will generate reference mutational
signatures by analyzing a large-scale cancer exome sequencing dataset from The Cancer Genome Atlas
(TCGA) that can be used to predict mutational signatures in single samples in clinical workflows. Overall, our
model will be of great interest to the cancer community as it will provide greater insights into mutational signature
patterns and will be useful in clinical settings where mutational signature inference is performed in single
samples.
Public Health Relevance Statement
Chemicals and biological processes can cause the mutations that are observed in human tumors. Understanding
the patterns of mutations (i.e. mutational signatures) in tumors that have undergone DNA sequencing can reveal
insights about different mutagenic processes and how tumors develop. Since current computational methods
such as non-negative matrix factorization (NMF) do not completely characterize these mutational patterns and
cannot predict signatures in single samples, we will develop a novel computational method and corresponding
R package that can better characterize mutational patterns in tumors and be used to predict signatures in single
clinical samples.
NIH Spending Category
CancerCancer GenomicsGeneticsHuman Genome
Project Terms
AlgorithmsBayesian ModelingBioconductorBioinformaticsBiological ProcessCancer EtiologyCancer Research ProjectCarcinogensChemicalsClinicalCommunitiesComputer softwareComputing MethodologiesCytidine DeaminaseDNA sequencingDataData SetDisadvantagedEpidemiologyEtiologyEvolutionFingerprintGenomeGenomicsGoalsGoldHead and Neck CancerHumanIndividualInformaticsLengthLocationMalignant NeoplasmsMalignant neoplasm of lungModelingMutationPathologic MutagenesisPatternPerformanceProbabilityProcessResearch PersonnelRunningSamplingSmokingTechnologyTestingThe Cancer Genome AtlasTrainingVisualizationbasecancer genomecancer genomicscancer typecigarette smokeclinical diagnosticscohortexomeexome sequencingexposure to cigarette smokefactor Aflexibilitygenome sequencinginsightinterestmethod developmentnovelpredictive signatureprotein activationtargeted sequencingtext searchingtooltumorwhole genome
No Sub Projects information available for 5R21CA226188-02
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5R21CA226188-02
Patents
No Patents information available for 5R21CA226188-02
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5R21CA226188-02
Clinical Studies
No Clinical Studies information available for 5R21CA226188-02
News and More
Related News Releases
No news release information available for 5R21CA226188-02
History
No Historical information available for 5R21CA226188-02
Similar Projects
No Similar Projects information available for 5R21CA226188-02