Awardee OrganizationFLORIDA INTERNATIONAL UNIVERSITY
Description
Abstract Text
Project Abstract/Summary
More than 75% of the data generated from mass spectrometry (MS) - based omics experiments are wasted due to
inefficiency of existing algorithmic methods that deduce peptides. The peptides that do get identified by existing
computational methods usually come from abundant proteins – and hence recent calls by scientists to study
overlooked proteins are gaining traction. These non-abundant and overlooked proteins might have the same (or
more) importance in human systems biology health, and disease. Yet, all downstream analysis and conclusions –
related to human health – are based on suboptimal and incomplete peptide deductions indicating formal investigation
is warranted and urgently needed. In the recent decade, advances in machine-learning (ML) models have provided
a critical step and have made it possible to develop more accurate and deeper pipelines for MS data analysis. Our
preliminary work and experiments suggest that the limited training search-space exhibited by labelled spectral libraries
makes robustness, and generalizability of existing ML models highly susceptible and may not effectively work for real-
world data. The overall objective of my research lab using this MIRA mechanism is to design and develop robust,
reliable, and generalizable machine-learning models for peptide deduction from MS data from omics experiments.
Our proposed work fills four key knowledge gaps in development of ML models pursued via this MIRA grant that, if
filled, will lead to superior computational techniques capable of inferring both abundant and non-abundant peptides.
Our general strategy will involve design and development of generative models, self-learning models, biologically
inspired models, and methods to infer uncertainty quantification. In addition, we will strive to focus on two key gaps in
adaptation of ML models that will be filled via developing ML-ready workflows and developing easy-to-use software
infrastructure that can be used by scientists. All this effort via MIRA grant mechanism will fill a critical gap in our
understanding and ability to deduce peptide (that are novel) and will contribute a fundamental tool for studying complex
communities in proteomics, and meta-proteomics data. At the end of this grant funding cycle, it is our expectation that
we will have designed and developed highly accurate ML peptide deduction engine capable of end-to-end analysis of
the MS based omics data– that is robust, generalizable, and more accurate than their algorithmic counterpart. Our
proposed work will facilitate reproducibility by developing ML models that perform well – irrespective of underlying MS
data quality or completeness – will be a highly impactful outcome. This proposed work will also serve as the
foundation for analysis of more complex data sets related to meta-proteomics, and proteogenomics as one of our
long-term goals that we hope to achieve using this MIRA grant mechanism.
Public Health Relevance Statement
Project Narrative
The proposed research is relevant to public health because understanding Mass Spectrometry (MS) based omics can
allow systematic analysis of thousands of proteins with the promise of discovering new proteins, and biomarkers for
various disease conditions and better understanding of human systems biology. Artificial intelligence and machine-
learning tools are needed to explain the high-dimensional MS data and such accurate tools will be instrumental in
elucidating the microbiome and cell function which affects virtually all aspects of human health. Therefore, this MIRA
proposal is relevant to NIH’s broader mission which support fundamental and innovative research strategies which
can become the basis of protecting and improving human health.
NIH Spending Category
No NIH Spending Category available.
Project Terms
AffectAlgorithmsArtificial IntelligenceBig DataBiological MarkersCell PhysiologyCommunitiesComplexComputational TechniqueComputing MethodologiesDataData AnalysesData SetDevelopmentDiseaseExhibitsFoundationsFundingGoalsGrantHealthHumanInvestigationKnowledgeLabelLearningLibrariesMachine LearningMass Spectrum AnalysisMethodsMissionModelingOutcomePeptidesPredispositionProteinsProteomicsPublic HealthReproducibilityResearchScientistSystems BiologyTractionTrainingUncertaintyUnited States National Institutes of HealthWorkalgorithmic methodologiescomplex datadata qualitydesignexpectationexperimental studygenerative modelshigh dimensionalityimprovedinnovationmachine learning modelmetaproteomicsmicrobiomenovelproteogenomicssoftware infrastructuretoolvirtualwasting
No Sub Projects information available for 1R35GM153434-01
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 1R35GM153434-01
Patents
No Patents information available for 1R35GM153434-01
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 1R35GM153434-01
Clinical Studies
No Clinical Studies information available for 1R35GM153434-01
News and More
Related News Releases
No news release information available for 1R35GM153434-01
History
No Historical information available for 1R35GM153434-01
Similar Projects
No Similar Projects information available for 1R35GM153434-01