Computer Analysis Of Low-complexity Amino Acid And Nucle
Project Number1Z01LM000025-14
Contact PI/Project LeaderWOOTTON, JOHN C.
Awardee OrganizationNATIONAL LIBRARY OF MEDICINE
Description
Abstract Text
The goal of this project is to define and analyze segments of protein and nucleotide sequences showing compositional bias and to understand their structural, functional and evolutionary significance, and their pathology. These sequences include local low complexity regions or domains, including conformationally mobile or intrinsically unstructured regions of proteins, tandemly-repeated sequences, and also more generally distributed amino acid content bias. The latter can reflect directional mutation pressures at the genomic level and constraints specific to protein or domain function. Low complexity regions comprise a large proportion of the genome-encoded amino acids, and may contain homopolymeric tracts or mosaics of a few amino acids, or repeated patterns, frequently subtle, including those typical of many non-globular domains. New mathematical definitions and algorithms are being developed to identify regions of compositional bias, and to discover and analyze properties of these regions relevant to their structures, interactions, biological functions, and evolution. Strong background bias is shown by proteins encoded by very AT-rich or GC-rich genomes, which include those of several important infectious disease organisms: these raise problems for sequence alignment algorithms which are being addressed. Local regions of low complexity and tandemly repeated amino acid sequences occur in many proteins involved in cellular differentiation and embryonic development, RNA processing, transcriptional regulation, signal transduction and aspects of cellular and extracellular structural integrity. Experimental data indicate that low complexity segments of proteins are generally non-globular, intrinsically unstructured, or conformationally mobile: however, knowledge of the molecular structures and dynamics of these domains is still very limited. They are generally relatively intractable to investigation by crystallography and NMR, and they account for less than 1% of the residues in current structural databases. Hence, mathematically rigorous sequence analysis and ab initio quantum chemical methods, together with some relevant high-resolution structural data, are methods of choice for gaining insights into these regions of proteins and for raising questions to be investigated expermentally. These methods are also valuable, for both nucleotide and amino acid sequences, in detecting and eliminating some artifacts in sequence database searches and alignment analysis.
No Sub Projects information available for 1Z01LM000025-14
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 1Z01LM000025-14
Patents
No Patents information available for 1Z01LM000025-14
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 1Z01LM000025-14
Clinical Studies
No Clinical Studies information available for 1Z01LM000025-14
News and More
Related News Releases
No news release information available for 1Z01LM000025-14
History
No Historical information available for 1Z01LM000025-14
Similar Projects
No Similar Projects information available for 1Z01LM000025-14