Penalized mixture cure models for identifying genomic features associated with outcome in acute myeloid leukemia
Project Number5R01LM013879-04
Contact PI/Project LeaderARCHER, KELLIE J.
Awardee OrganizationOHIO STATE UNIVERSITY
Description
Abstract Text
Molecular features associated with time-to-event outcomes, such as overall or disease-free survival, may be
prognostically relevant or potential therapeutic targets. Therefore, analyzing data from high-throughput genomic
assays with clinical follow-up data has been of growing interest. The Cancer Genome Atlas (TCGA) Project has
collected baseline demographic, clinical characteristics, and follow-up data for 11,125 patients for 32 different
cancer types and corresponding tissue samples were processed for examining SNPs, copy number, methylation,
miRNA expression, and mRNA expression. Because the number of variables (P ) exceeds the sample size (N),
one strategy frequently employed when associating molecular features to survivorship data is to fit univariable
Cox proportional hazards (PH) models followed by adjustment for multiple hypothesis tests using a false discovery
rate approach. However, most chronic conditions and diseases, including cancer, are likely caused by multiple
dysregulated genes or mutations. It is therefore critical to fit multivariable models in the presence of a high-
dimensional covariate space. Traditional statistical methods cannot be used when the number of features exceeds
the sample size (e.g., P > N), though penalized methods perform automatic variable selection and accommodate
the P > N scenario. Penalized approaches including LASSO, smoothly clipped absolute deviation (SCAD),
adaptive LASSO, and Bayesian LASSO have all been extended to Cox's PH model for handling high-dimensional
covariate spaces. However, when modeling survival or other time-to-event outcomes, the Cox PH model assumes
that all subjects will experience the event of interest, which is violated when a subset of subjects are cured.
Instead, when a subset of subjects in the data are cured, mixture cure models should be fit. Although mixture
cure models have been described for traditional settings where the number of samples exceeds the number
of covariates, limited variable selection methods and no methods for high-dimensional model fitting currently
exist for mixture cure models. Therefore, this project will overcome a critical barrier to progress in this field
by developing penalized parametric and semi-parametric mixture cure models applicable for high-dimensional
datasets. The specific aims of this application are to: (1) Develop penalized parametric mixture cure models
for high-dimensional datasets; and (2) Develop a penalized semi-parametric proportional hazards mixture cure
model for high-dimensional datasets. For both aims we will characterize the performance of the methods using
extensive simulation studies, develop software, and distribute R packages to CRAN. In aim (3) we will identify
molecular features associated with cure and survival using our large unique AML dataset from the Alliance for
Clinical Trials in Oncology and assess robustness of findings using AML datasets from Gene Expression Omnibus
and The Cancer Genome Atlas project. This research will fill a critical gap as there are currently no mixture cure
models for high-dimensional data. We anticipate application of our methods to our AML data will enhance existing
risk stratification systems used in daily clinical practice that determine treatment intensity and modality.
Public Health Relevance Statement
Relevance
In acute myeloid leukemia (AML), the advent of effective targeted therapies, intensified chemotherapy, and allo-
geneic stem cell transplant have increased long-term survival such that a non-negligible subset of treated patients
are cured; when cured subjects are present in a dataset, mixture cure models should be used in place of tra-
ditional survival analytic techniques. We will develop novel methodologies and software for fitting mixture cure
models that accommodate modeling a high-dimensional feature space to enable researchers to identify molec-
ular features from high-throughput genomic assays that are associated with likelihood of cure and time-to-event
outcomes. Application of our methods will enhance existing AML risk stratification systems used in daily clinical
practice that determine treatment intensity and modality, identify potential therapeutic targets, and will be broadly
applicable to the large number of genomic studies housed in NCBI's GEO portal that involve time-to-event analy-
ses and a cured fraction.
NIH Spending Category
No NIH Spending Category available.
Project Terms
Acute Myelocytic LeukemiaAmerican Cancer SocietyArchivesBiologicalBiological AssayCessation of lifeCharacteristicsChronicClinicalClinical TrialsClipComputer softwareCox Proportional Hazards ModelsDataData ScienceData SetDecision MakingDevelopmentDiagnosisDiseaseDisease-Free SurvivalEffectivenessEventGene ExpressionGenesGenomicsHigh-dimensional ModelingMalignant NeoplasmsMethodologyMethodsMethylationMicroRNAsModalityModelingMolecularMutationOncologyOutcomePatientsPerformancePopulation StudyPredispositionProbabilityProcessPrognosisPropertyRUNX1 geneResearchResearch PersonnelResearch Project GrantsRiskSample SizeSamplingSampling StudiesStatistical MethodsStem cell transplantSubgroupSystemTechniquesTechnologyTestingThe Cancer Genome AtlasTherapeuticTherapeutic AgentsTimeTissue SampleUnited States National Library of Medicineagedbiomedical informaticscancer typechemotherapyclinical practiceexperiencefollow-uphazardhigh dimensionalityimprovedinterestmRNA Expressionmultidimensional datanew therapeutic targetnovelprognosticrisk stratificationsemiparametricsimulationsoftware developmentsurvival outcomesurvivorshiptargeted treatmenttherapeutic target
No Sub Projects information available for 5R01LM013879-04
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5R01LM013879-04
Patents
No Patents information available for 5R01LM013879-04
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5R01LM013879-04
Clinical Studies
No Clinical Studies information available for 5R01LM013879-04
News and More
Related News Releases
No news release information available for 5R01LM013879-04
History
No Historical information available for 5R01LM013879-04
Similar Projects
No Similar Projects information available for 5R01LM013879-04