DOE Joint Genome Institute

  • COVID-19
  • About Us
  • Contact Us
  • Our Science
    • DOE Mission Areas
    • Bioenergy Research Centers
    • Science Programs
    • Science Highlights
    • Scientists
    Data yielded from RIViT-seq increased the number of sigma factor-gene pairs confirmed in Streptomyces coelicolor from 209 to 399. Here, grey arrows denote previously known regulation and red arrows are regulation identified by RIViT-seq; orange nodes mark sigma factors while gray nodes mark other genes. (Otani, H., Mouncey, N.J. Nat Commun 13, 3502 (2022). https://doi.org/10.1038/s41467-022-31191-w)
    Streamlining Regulon Identification in Bacteria
    Regulons are a group of genes that can be turned on or off by the same regulatory protein. RIViT-seq technology could speed up associating transcription factors with their target genes.

    More

    (PXFuel)
    Designer DNA: JGI Helps Users Blaze New Biosynthetic Pathways
    In a special issue of the journal Synthetic Biology, JGI scientific users share how they’ve worked with the JGI DNA Synthesis Science Program and what they’ve discovered through their collaborations.

    More

    A genetic element that generates targeted mutations, called diversity-generating retroelements (DGRs), are found in viruses, as well as bacteria and archaea. Most DGRs found in viruses appear to be in their tail fibers. These tail fibers – signified in the cartoon by the blue virus’ downward pointing ‘arms’— allow the virus to attach to one cell type (red), but not the other (purple). DGRs mutate these ‘arms,’ giving the virus opportunities to switch to different prey, like the purple cell. (Courtesy of Blair Paul)
    A Natural Mechanism Can Turbocharge Viral Evolution
    A team has discovered that diversity generating retroelements (DGRs) are not only widespread, but also surprisingly active. In viruses, DGRs appear to generate diversity quickly, allowing these viruses to target new microbial prey.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    Photograph of a stream of diatoms beneath Arctic sea ice.
    Polar Phytoplankton Need Zinc to Cope with the Cold
    As part of a long-term collaboration with the JGI Algal Program, researchers studying function and activity of phytoplankton genes in polar waters have found that these algae rely on dissolved zinc to photosynthesize.

    More

    This data image shows the monthly average sea surface temperature for May 2015. Between 2013 and 2016, a large mass of unusually warm ocean water--nicknamed the blob--dominated the North Pacific, indicated here by red, pink, and yellow colors signifying temperatures as much as three degrees Celsius (five degrees Fahrenheit) higher than average. Data are from the NASA Multi-scale Ultra-high Resolution Sea Surface Temperature (MUR SST) Analysis product. (Courtesy NASA Physical Oceanography Distributed Active Archive Center)
    When “The Blob” Made It Hotter Under the Water
    Researchers tracked the impact of a large-scale heatwave event in the ocean known as “The Blob” as part of an approved proposal through the Community Science Program.

    More

    A plantation of poplar trees. (David Gilbert)
    Genome Insider podcast: THE Bioenergy Tree
    The US Department of Energy’s favorite tree is poplar. In this episode, hear from ORNL scientists who have uncovered remarkable genetic secrets that bring us closer to making poplar an economical and sustainable source of energy and materials.

    More

  • Data & Tools
    • IMG
    • Data Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    HPCwire Editor's Choice Award (logo crop) for Best Use of HPC in the Life Sciences
    JGI Part of Berkeley Lab Team Awarded Best Use of HPC in Life Sciences
    The HPCwire Editors Choice Award for Best Use of HPC in Life Sciences went to the Berkeley Lab team comprised of JGI and ExaBiome Project team, supported by the DOE Exascale Computing Project for MetaHipMer, an end-to-end genome assembler that supports “an unprecedented assembly of environmental microbiomes.”

    More

    With a common set of "baseline metadata," JGI users can more easily access public data sets. (Steve Wilson)
    A User-Centered Approach to Accessing JGI Data
    Reflecting a structural shift in data access, the JGI Data Portal offers a way for users to more easily access public data sets through a common set of metadata.

    More

    Phytozome portal collage
    A More Intuitive Phytozome Interface
    Phytozome v13 now hosts upwards of 250 plant genomes and provides users with the genome browsers, gene pages, search, BLAST and BioMart data warehouse interfaces they have come to rely on, with a more intuitive interface.

    More

  • User Programs
    • Calls for Proposals
    • Special Initiatives & Programs
    • Product Offerings
    • User Support
    • Policies
    • Submit a Proposal
    screencap from Amundson and Wilkins subsurface microbiome video
    Digging into Microbial Ecosystems Deep Underground
    JGI users and microbiome researchers at Colorado State University have many questions about the microbial communities deep underground, including the role viral infection may play in other natural ecosystems.

    Read more

    Yeast strains engineered for the biochemical conversion of glucose to value-added products are limited in chemical output due to growth and viability constraints. Cell extracts provide an alternative format for chemical synthesis in the absence of cell growth by isolating the soluble components of lysed cells. By separating the production of enzymes (during growth) and the biochemical production process (in cell-free reactions), this framework enables biosynthesis of diverse chemical products at volumetric productivities greater than the source strains. (Blake Rasor)
    Boosting Small Molecule Production in Super “Soup”
    Researchers supported through the Emerging Technologies Opportunity Program describe a two-pronged approach that starts with engineered yeast cells but then moves out of the cell structure into a cell-free system.

    More

    These bright green spots are fluorescently labelled bacteria from soil collected from the surface of plant roots. For reference, the scale bar at bottom right is 10 micrometers long. (Rhona Stuart)
    A Powerful Technique to Study Microbes, Now Easier
    In JGI's Genome Insider podcast: LLNL biologist Jennifer Pett-Ridge collaborated with JGI scientists through the Emerging Technologies Opportunity Program to semi-automate experiments that measure microbial activity in soil.

    More

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    A view of the mangroves from which the giant bacteria were sampled in Guadeloupe. (Hugo Bret)
    Giant Bacteria Found in Guadeloupe Mangroves Challenge Traditional Concepts
    Harnessing JGI and Berkeley Lab resources, researchers characterized a giant - 5,000 times bigger than most bacteria - filamentous bacterium discovered in the Caribbean mangroves.

    More

    In their approved proposal, Frederick Colwell of Oregon State University and colleagues are interested in the microbial communities that live on Alaska’s glacially dominated Copper River Delta. They’re looking at how the microbes in these high latitude wetlands, such as the Copper River Delta wetland pond shown here, cycle carbon. (Courtesy of Rick Colwell)
    Monitoring Inter-Organism Interactions Within Ecosystems
    Many of the proposals approved through JGI's annual Community Science Program call focus on harnessing genomics to developing sustainable resources for biofuels and bioproducts.

    More

    Coloring the water, the algae Phaeocystis blooms off the side of the sampling vessel, Polarstern, in the temperate region of the North Atlantic. (Katrin Schmidt)
    Climate Change Threatens Base of Polar Oceans’ Bountiful Food Webs
    As warm-adapted microbes edge polewards, they’d oust resident tiny algae. It's a trend that threatens to destabilize the delicate marine food web and change the oceans as we know them.

    More

News & Publications
Home › News Releases › DOE JGI Sets ‘Gold Standard’ for Metagenomic Data Analysis

May 14, 2007

DOE JGI Sets ‘Gold Standard’ for Metagenomic Data Analysis

WALNUT CREEK, CA–With the advent of more powerful and economical DNA sequencing technologies, gene discovery and characterization is transitioning from single-organism studies to revealing the potential biotechnology applications embedded in communities of microbial genomes, or metagenomes. The field of metagenomics is still in its infancy–the equivalent of the early days of the California Gold Rush, with labs vying to stake their claim.

Amidst the prospecting, the call has been issued for methods to separate fool’s gold from the real nuggets. Such a gold standard has now been provided through work led by the U.S. Department of Energy Joint Genome Institute (DOE JGI) with colleagues from Oak Ridge National Laboratory and IBM’s T.J. Watson Research Center. Their results are published in the May edition of Nature Methods.

“DOE JGI and our collaborators have pioneered the use of DNA sequencing-based technologies to understand microbial communities through a combination of computational and experimental methods,” said Konstantinos Mavrommatis, lead author of the paper and a post-doctoral fellow in DOE JGI’s Genome Biology Program. “We are now exploring ways to analyze metagenomic data to enable accurate classification of sequence fragments into their corresponding species populations. The goal is to reconstruct metabolic pathways by comparing with reference isolate genomes, so that we can model ecosystem dynamics using metabolic reconstructions of metagenomic data. “However, so far all the methods that have been developed were aimed toward analyzing data coming from single, isolate genomes. In this instance, the situation is simple; we know what gene belongs to which organism. In metagenomes, it’s much more challenging, because you have sequences from many different organisms all mixed up, and moreover, you don’t have enough sequence from each to capture an accurate picture of the entire community, so you only get a glimpse of the identities of multiple genomes. All the publications to date have made the assumption that these tools will work as efficiently for metagenomes–but we really didn’t know.”

Nikos Kyrpides, DOE JGI Genome Biology Program Head, said that, to evaluate the magnitude of this problem and identify the inherent pitfalls, “we have constructed three simulated metagenomic datasets of varying complexity by mixing pieces of over one hundred already sequenced isolate organisms. This approach allowed us to quantify the fidelity of several data processing methods, since we could identify the correct answer by comparing the synthetic datasets to the corresponding isolate genomes.”

“This paper provides an extremely useful survey of tools and existing approaches for metagenome analysis and points out their weaknesses,” said Natalia Maltsev, of the Bioinformatics Group, Mathematics and Computer Science Division at Argonne National Laboratory. “The simulated datasets constructed by the authors provide a much-needed test bed for evaluation and comparisons of these tools. Their findings will no doubt have a very significant impact on the field of metagenomics in general. It will help groups like mine to choose efficient strategies for the development of automated methods for high-throughput metagenome analysis. And last, but not least, it will stimulate the development of new computational tools and approaches for studies of microbial communities.”

In a shotgun sequencing process, the DNA from the microbial genomes is first sheared into millions of small fragments to enable the amplification, labeling, and ultimately sequencing. Genome assembly is the process of putting the sequenced fragments back in order, in effect, putting Humpty Dumpty back together again–to recreate the identity of an organism from the scattered puzzle pieces of DNA.

“One of the problems in assembling metagenomes,” said Mavrommatis, “is that you end up with large fragments of unknown accuracy and a substantial number of sequences that fail to fit onto those larger fragments. On many occasions, this information is not taken into account–depriving the analysis of valuable information embedded in that sequence.

The solution proposed by Mavrommatis and his colleagues was to evaluate and compare the existing methods to see which performs best for the particular environmental samples being analyzed. “What we did was to take known sample genomes, shuffle them, create simulated metagenomes, and use those tools on them, and then we went back and compared the results to the isolate genomes. Essentially, we applied the gold standard–the truth–and found there were tools that shouldn’t have been used because their predictive accuracy was very low. But it also validated some of our assumptions.”

Mavrommatis said that, for example, when using the widely used sequence assembly tool Phrap, they actually saw artifacts created by the program caused by mixing sequences that should not have been mixed.

“It’s like when you’re in the market for a digital camera, you can go to web sites like CNET to see the reviews, make the comparisons, and get some guidance for choosing the right product for your particular needs.”

Another major problem with metagenomes is binning. Binning is the process of identifying from what organism a particular sequence has originated. There are several methods employed to bin sequences. BLAST (Basic Local Alignment Search Tool) is a method used to rapidly search for similar sequences in existing public databases.

Mavrommatis said that a popular approach is to take the sequence, BLAST it against the database, and find the best hit and assume that the sequence queried belongs to the same group of organisms. Other methods use intrinsic features of the sequence, such as oligonucleotide frequencies. Patterns of these features help to discriminate between the possible groups of organisms that contributed the sequence.

Several more methods have been proposed for binning, but none of these on their own have proven satisfactory, Mavrommatis said. “What we propose in the paper is a way to evaluate the appropriateness and accuracy of the binning methods using the same datasets in order to set a gold standard–we have designed the reference-simulated metagenome.”

Through the Nature Methods publication, Mavrommatis has invited others to contribute new methods as they arise to continue to update the server and sustain the value of the system. This is also facilitated through a server called “Fames” available at the DOE JGI (http://fames.jgi-psf.org/) where the community of researchers can check the most recent results, compare their dataset from their metagenome of interest against the simulated metagenome, and receive guidance as to which are the optimal tools for analysis.

“Having such a tool at hand for the first time now, the community can not only compare the methods, but can also ask the question, why is this method better, or why does this one fail? Over all, we hope that it will help to improve the process and lead to further development of new methods for evaluating metagenomes, particularly since this gold rush is not going away any time soon.”

The other DOE JGI authors on the study are Natalia Ivanova, Kerrie Barry, Harris Shapiro, Eugene Goltsman, Asaf Salamov, Frank Korzeniewski, Miriam Land, Alla Lapidus, Igor Grigoriev, Paul Richardson, Philip Hugenholtz and Nikos Kyrpides.

The DOE Joint Genome Institute, supported by the DOE Office of Science, unites the expertise of five national laboratories, Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest, along with the Stanford Human Genome Center to advance genomics in support of the DOE mission related to clean energy generation and environmental characterization and clean-up. DOE JGI’s Walnut Creek, Calif. Production Genomics Facility provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges.

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to print (Opens in new window)

The U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. JGI provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. Follow @jgi on Twitter.

DOE’s Office of Science is the largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Filed Under: News Releases

More topics:

  • COVID-19 Status
  • News
  • Science Highlights
  • Blog
  • Webinars
  • CSP Plans
  • Featured Profiles

Related Content:

Busting the Unbreakable Lignin

Pictured is a micrograph of Neocallimastix californiae.

Tracing the Evolution of Shiitake Mushrooms

A vertical tree stump outdoors with about a dozen shiitake mushrooms sprouting from its surface.

JGI announces final round of 2022 Functional Genomics awardees

Digital ID card with six headshots reads: Congratulations to our 2022 Function Genomics recipients!

Introducing New Members of the JGI User Executive Committee

incoming 2023 UEC members

JGI at 25: Mapping Switchgrass Traits with Common Gardens

Aerial photo of the switchgrass diversity panel late in the 2020 season at the Kellogg Biological Station in Michigan. (Robert Goodwin)

JGI Contributes Nine to 2022 Highly Cited Researchers List

Nine headshots, one for each researcher, laid out beside a purple ribbon reading, "Home to Highly Cited Researchers 2022 Clarivate"
  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Policies
  • Emergency Info
  • Accessibility / Section 508 Statement
  • Flickr
  • LinkedIn
  • RSS
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2023 The Regents of the University of California