Statistical methods for higher order dependences to understand protein functions
Project Number5R01GM144961-02
Contact PI/Project LeaderZHOU, WEN
Awardee OrganizationCOLORADO STATE UNIVERSITY
Description
Abstract Text
This proposal brings together a strong team from molecular science and statistics to tackle the important
problem of how to integrate protein structure and sequence information in complex systems. Some of the
most important characteristics of these data are the strong correlations buried within them, with the
pairwise correlations in the sequence data already being routinely used to predict structural contacts. Here,
we are developing novel ways to use huge data sets to extract higher-order dependences, which are now
possible with the availability of the large volumes of sequence data from genomics; and in addition, in the
molecular structures such higher-order dependences are directly observable in the protein structures where
groups of amino acids interact directly. Importantly, these higher-order dependences reflect the dense
physical environment in the cell that requires for proper statistical characterization. A new model free
information-theoretic measure is introduced to quantify the higher-order dependences, which serves as the
central method in this project. By identifying the major challenges in drawing statistical inference based on
this measure, we develop, evaluate, and improve a new statistical inference and computational framework
for analyses of higher-order dependences with discrete data of a general type, motivated by the protein
multiple sequence data. The new computationally efficient framework makes it possible to discover reliable
higher-order dependences with the ability of quantifying uncertainty. The preliminary data here combine the
information from sequences and structures to yield unexpected results that immediately relate to the
dynamics of the protein structures. The outcome is an entirely new approach to handle the large volumes
of protein sequence data and other omics data now available and the enormous volumes about to arrive on
the doorsteps of omics analysts.
Public Health Relevance Statement
The scientific impact of to-be-developed new methods for analyzing large biological data sets can have a
profound effect throughout broad fields of public health, including novel prognosis, drug design, and new
treatment. The project results, novel methods and databases will be disseminated through new web sites
where the algorithms can be applied and data can be publicly accessed.
No Sub Projects information available for 5R01GM144961-02
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5R01GM144961-02
Patents
No Patents information available for 5R01GM144961-02
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5R01GM144961-02
Clinical Studies
No Clinical Studies information available for 5R01GM144961-02
News and More
Related News Releases
No news release information available for 5R01GM144961-02
History
No Historical information available for 5R01GM144961-02
Similar Projects
No Similar Projects information available for 5R01GM144961-02