Despite decades of work by structural biologists, there are still ~5,200 protein families with unknown structure outside the range of comparative modeling. It has recently been shown that this gap can be largely reduced if Rosetta structure prediction pipeline is augmented with residue-residue contacts inferred from evolutionary information. Such significant boost in the number of proteins amenable to reliable modeling is largely stipulated by accumulation of huge amount of sequence data from rapidly growing metagenomics projects. Much greater volume of sequence information available these days enables formulating more specific questions about protein organization and function. This project is aimed at making use of all the available metagenome and metatranscriptome sequence data (i) to expand the structural universe of eukaryotic proteins, (ii) address the problem of structural reconstruction of protein-protein interaction networks and (iii) perform protein functional annotation based on their inferred structures and interaction partners. Microbes play crucial roles in maintaining the planet’s biogeochemical cycles. Deciphering 3D structures of proteins and their complexes is the key to understanding microorganism functioning at the molecular level.
Proposer: David Baker, University of Washington
Proposal: Eukaryotic Protein Structure Determination Using Metagenome and Metatranscriptome Sequence Data