GPU-accelerated high-performance computing to supercharge foundational deep learning method development for scalable and accurate prediction of protein structures
Project Number3R35GM138146-05S1
Former Number5R35GM138146-05
Contact PI/Project LeaderBHATTACHARYA, DEBSWAPNA
Awardee OrganizationVIRGINIA POLYTECHNIC INST AND ST UNIV
Description
Abstract Text
PROJECT SUMMARY/ABSTRACT:
This supplement aims to acquire a Dell high-performance computing (HPC) server with 8 NVIDIA H100
Graphics Processing Units (GPUs) to supercharge foundational deep learning method development for
scalable and accurate prediction of protein structures, paving the way to genomic-scale computational protein
modeling regardless of evolutionary relationships with previously annotated proteins. Artificial intelligence-
powered methods have led to a paradigm-shift in computational modeling of protein structures, yet even the
most successful approaches for protein structure prediction fail to accurately predict structures of large multi-
domain proteins with complex topologies or proteins with short sequences; and heavily depend on the
availability of evolutionary information that are not always abundant such as with orphan proteins or rapidly
evolving proteins. Work on structure prediction that uses single or few homologous sequences remains
inaccurate and/or inefficient, limiting scaling to genomic protein databases. Latest advances in artificial
intelligence such as foundational deep learning models hold the key to address the limitations. The parent R35
grant of this supplement aims to develop cutting-edge deep learning models to automate genomic-scale
protein structure modeling with the key tasks of: (1) accurate de novo modeling of protein structures beyond
evolutionary relatedness, even with single-sequence input; (2) high-fidelity identification of remotely
homologous proteins despite low sequence similarly to previously annotated proteins; and (3) atomistic
refinement of predicted protein structures to drive them towards experimental resolutions terms of
stereochemical qualities and side-chain positioning. Our substantial progress in the first three years of the
project has demonstrated the feasibility and promise of our approach. However, training and testing
foundational deep learning models leveraging the transformer neural network architectures on evolutionary-
scale molecular data require a large amount of GPU computing power. Using the current GPU resource
available to us, it takes six months for a developer to complete the training and testing of one deep learning
method end to end. While such a speed can yield steady progress, it is not fast enough to unleash the power
of these advanced deep learning methods and realize the full potential and impact of the parent R35 project.
This supplement will enable us to acquire a high-performance computing server consisting of 8 NVIDIA H100
80GB GPUs to significantly speed up the research in the parent R35 project. The requested GPUs can
drastically reduce the time to complete the development a deep learning method from about six months to less
than six weeks, thus dramatically improving the productivity of the developers and in turn accelerating
publication and dissemination of the methods and tools developed in this project. The large shared GPU
memory will enable us to train and optimize robust foundational deep learning methods powered by
transformers having millions of parameters, leading to scalable and accurate prediction of protein structures.
Public Health Relevance Statement
PROJECT NARRATIVE:
Genomic-scale annotation of protein structures is of vital importance in contemporary biomedical research for
understanding the molecular basis of complex diseases and facilitating structure-based drug design. This
project will acquire a Dell high-performance computing server with 8 Nvidia H100 Graphics Processing Units
(GPUs) to supercharge foundational deep learning method development for scalable and accurate prediction
of protein structures, thus accelerating drug discovery and improving therapeutic strategies.
No Sub Projects information available for 3R35GM138146-05S1
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 3R35GM138146-05S1
Patents
No Patents information available for 3R35GM138146-05S1
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 3R35GM138146-05S1
Clinical Studies
No Clinical Studies information available for 3R35GM138146-05S1
News and More
Related News Releases
No news release information available for 3R35GM138146-05S1
History
No Historical information available for 3R35GM138146-05S1
Similar Projects
No Similar Projects information available for 3R35GM138146-05S1