Protein annotations provide insights into environmental tendencies
a spring in yellowstone national park
Heart Spring in the Upper Geyser Basin of Yellowstone National Park is one of the sampling sites for the analysis. (NPS/Diane Renkin)

The Science

The Metagenome Stability Diagram — a tool within the Integrated Microbial Genomes and Microbiomes (IMG/M) platform — has been developed to help inform scientists and researchers more about the patterns and functions of environmental microbes. Researchers can use the tool to correlate protein annotations with ratios of DNA sequence fragments from the ecosystems. Selecting a protein identified from their studied system results in seeing their predicted functional and metabolic roles across global metagenomes (within IMG). 

The Impact

The initial analysis highlights abundances of gene annotations that provide insights into the nutrients that are being used and cycled in different environments. It also shows how environmental conditions may affect metabolic activity of microbiomes. This is key because these examples were picked from some of the top ecosystem-differentiating protein-coding annotations in all IMG metagenomes. The work is of interest to the Department of Energy (DOE) Office of Science’s Biological and Environmental Research program because it also allows scientists to understand the assembly, function, and the behavior of microbiomes. The insights could allow them to manipulate microbiomes to facilitate microbial solutions to challenging environmental problems, advancing their utility across the bioeconomy.

Summary

As described in a recent mSystems paper, a team at the DOE Joint Genome Institute (JGI), a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory, developed a statistical tool that helps researchers utilize gene markers across different ecosystems to better understand how nutrients are used and cycled. Additionally, it can show how environmental conditions are constricting metabolic activity in microbiomes. The initial analysis used over 12,000 metagenomes from diverse environments and grouped those with similar microbial communities together. Those similarities, based on their ratios of small DNA sequence fragments called tetranucleotides, are a way to measure the collection of the microbiome’s genetic makeup. Microbes have specific tetranucleotide frequencies, and an area’s environment hosts particular organisms and metabolisms. This implies that microbiome (metagenome) tetranucleotide frequencies will be specific to the environment. 

Machine learning was also a large part of the creation of the tool. The team used two types: Linear Discriminant Analysis (LDA) and k-Nearest Neighbor (KNN) Classification. LDA is a supervised data compression technique that was used to take the complex, high-dimensional data (the 136 tetranucleotide frequencies for each of the 12,063 metagenomes) and reduce it into two axes (LDA coordinates). These KNN-derived phase classifications can be seen on the tool, mapping out similar regions of tetranucleotide frequency.

This work connects microbiome genotypes to environmental conditions. Because they can take the frequencies and plot them with ecosystem classifications - it suggests that the composition of the environmental conditions can be estimated. In short, the complexity of nutrients, constraints, etc. of natural environments shape specific genomic signatures that could be used to hypothesize environmental conditions. 

The team plans to monitor and regularly update the Metagenome Stability Diagram. The tool is hosted on the National Energy Research Scientific Computing Center’s (NERSC) SPIN environment and linked from IMG.

Tool
Metagenome Stability Diagram in IMG/M Data portal
Click to visit IMG/M portal

Contacts

PI Contact 

Matthew Kellom
Computational Biologist
Metagenome Program
DOE Joint Genome Institute
[email protected]

Back to Science Stories
More Details