Improving Deposition Quality and FAIRness of Metabolomics Workbench
PROJECT SUMMARY (30 lines)
The practical reuse of genomics and transcriptomics datasets is well-demonstrated due to the use of
universal gene identifiers that facilitate matching of features across these datasets, high feature coverage,
standardized metadata and data deposition formats, and a maturity in deposition quality and consistency.
However, metabolomics datasets are much harder to reuse due to the lack of standardization metabolite
feature identification, heterogeneity in feature coverage, and high variability in deposition quality and
consistency. Therefore, it is much harder to both find relevant metabolomics datasets from repositories
like Metabolomics Workbench (MWbench) and effectively reuse these datasets to generate and/or test
hypotheses. To address these difficulties in reusing metabolomics datasets, deposition quality must be
improved. Furthermore, methods that enable the effective search and harmonization of MWbench studies are
needed, especially for integrative multi-omics analyses. We are the developers of the only set of available open-
source tools for parsing, generating, and validating mwTab formatted repository files. Our experience developing
and utilizing this open-source mwtab Python package makes us uniquely qualified to develop methods to
improve both deposition and FAIRness of MWbench studies. Also, we have provided periodic feedback to
MWbench based on systematic evaluations of the repository to enable the improvement of this growing public
resource (2). Therefore, we propose to develop methods and open-source tools that will improve deposition
quality and FAIRness of MWbench through the following specific aims: Aim 1: Enable comprehensive capture,
deposition, and validation of metabolomics experimental data and metadata; Aim 2: Improve the FAIRness of
Metabolomics Workbench while demonstrating effective multi-omics integration with the Genotype-Tissue
Expression Project (GTEx). The major innovations that this proposal will develop are: i) effective metadata
capture methods from unstructured formats, ii) advanced search methods for relevant MWbench studies that
can filter on metadata quality, iii) effective harmonization methods for MWbench studies, iv) new omics
integration approach to detect human gene-metabolite associations, and v) new tools that facilitate public
deposition with high-quality metadata, with InChI tags, and in mwTab format for quicker, easier deposition.
The significance of this proposal is in developing methods and tools that: a) comprehensively
capture, validate, and deposit metadata-rich metabolomics data, b) improve the FAIRness of MWbench datasets,
especially reuse, c) enable integration of MWbench and GTEx datasets to generate biomedically-relevant human
gene-metabolite associations, and d) enable interpretation of gene-metabolite associations within molecular
interaction networks. These new tools will enhance the utility and usage of Metabolomics Workbench
while demonstrating multi-omics integration with the Genotype-Tissue Expression Project.
Public Health Relevance Statement
PROJECT NARRATIVE (3 sentences)
We will develop methods and tools needed to effectively archive and use public biomedical datasets describing
the thousands of small biomolecules (metabolomics) involved in human biology and disease. We will use
these methods to demonstrate the integration and reuse of thousands of datasets in both the Metabolomics
Workbench and the Genotype-Tissue Expression Project to derive new biomedical knowledge that describes
how gene-products relate to the small molecules generated in metabolism, i.e. the chemical processes of life.
No Sub Projects information available for 1R03OD030603-01
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 1R03OD030603-01
Patents
No Patents information available for 1R03OD030603-01
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 1R03OD030603-01
Clinical Studies
No Clinical Studies information available for 1R03OD030603-01
News and More
Related News Releases
No news release information available for 1R03OD030603-01
History
No Historical information available for 1R03OD030603-01
Similar Projects
No Similar Projects information available for 1R03OD030603-01