Exploration of DNA functionality using language models
Project Number1DP2LM014811-01
Former Number1DP2OD036767-01
Contact PI/Project LeaderFAN, XIAO
Awardee OrganizationUNIVERSITY OF FLORIDA
Description
Abstract Text
Project Summary
Deoxyribonucleic acid (DNA), carrying genetic instructions for the growth, functioning, and
reproduction of most living organism, presents a critical avenue for exploring evolutionary
history, disease etiology, personalized medical interventions, and beyond. However, our grasp of
the role played by DNA sequences, especially non-coding DNA, remains limited. Unraveling this
mystery proves challenging due to the labor-intensive and resource-demanding nature of
experiments required for its decoding. Although computational techniques have arisen, they
grapple with obstacles such as insufficient training data to unlock the secretes within DNA
sequences. Inspired by recent advancements in natural language processing, we recognize
significant prospects for employing a similar approach to study DNA sequences as a form of
biological language. Our central concept revolves around developing an advanced DNA
language model. Aligning with the trajectories of its linguistic and protein structural counterparts,
our model holds the potential to reinvigorate the research of DNA structure and functionality.
This model serves as a versatile tool, enabling the exploration of various functions residing
within DNA sequences using an innovative multi-task learning architecture. Our initial
exploration focuses on unveiling the fundamental mechanisms driving DNA regulation. The
model is adaptable to various DNA functions under the multi-task learning framework.
Additionally, the model facilitates the understanding of the functional impacts of non-coding
variants, which has profound implications for genetic testing. Our innovative framework could
significantly expand the scope of variants that can be reported in genetic testing, thereby
enhancing our ability to identify genetic contributions to health and disease and guiding
personalized medical interventions.
Public Health Relevance Statement
Project Narrative
This research holds significant relevance to public health by advancing our understanding of
non-coding DNA functions and variant effects, key factors in genetic regulation and disease
development. The innovative use of large language models and multi-task learning enhances
our ability to interpret genetic variants, potentially improving diagnostic accuracy and guiding
personalized medical interventions, thereby contributing to more effective disease prevention,
treatment, and genetic counseling.
NIH Spending Category
No NIH Spending Category available.
Project Terms
ArchitectureAutomobile DrivingBiologicalComputational TechniqueDNADevelopmentDiseaseEtiologyGeneticGenetic CounselingGrowthHealthInstructionInterventionLanguageLinguisticsMedicalModelingNatural Language ProcessingNatureOrganismPlayProteinsPublic HealthRecording of previous eventsRegulationReportingReproductionResearchResourcesRoleStructureUntranslated RNAVariantdiagnostic accuracydisorder preventionexperimental studygenetic testinggenetic variantgraspimprovedinnovationlarge language modelmulti-task learningtooltraining data
No Sub Projects information available for 1DP2LM014811-01
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 1DP2LM014811-01
Patents
No Patents information available for 1DP2LM014811-01
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 1DP2LM014811-01
Clinical Studies
No Clinical Studies information available for 1DP2LM014811-01
News and More
Related News Releases
No news release information available for 1DP2LM014811-01
History
No Historical information available for 1DP2LM014811-01
Similar Projects
No Similar Projects information available for 1DP2LM014811-01