DOE Joint Genome Institute

  • COVID-19
  • About Us
  • Contact Us
  • Our Science
    • DOE Mission Areas
    • Bioenergy Research Centers
    • Science Programs
    • Science Highlights
    • Scientists
    Data yielded from RIViT-seq increased the number of sigma factor-gene pairs confirmed in Streptomyces coelicolor from 209 to 399. Here, grey arrows denote previously known regulation and red arrows are regulation identified by RIViT-seq; orange nodes mark sigma factors while gray nodes mark other genes. (Otani, H., Mouncey, N.J. Nat Commun 13, 3502 (2022). https://doi.org/10.1038/s41467-022-31191-w)
    Streamlining Regulon Identification in Bacteria
    Regulons are a group of genes that can be turned on or off by the same regulatory protein. RIViT-seq technology could speed up associating transcription factors with their target genes.

    More

    (PXFuel)
    Designer DNA: JGI Helps Users Blaze New Biosynthetic Pathways
    In a special issue of the journal Synthetic Biology, JGI scientific users share how they’ve worked with the JGI DNA Synthesis Science Program and what they’ve discovered through their collaborations.

    More

    A genetic element that generates targeted mutations, called diversity-generating retroelements (DGRs), are found in viruses, as well as bacteria and archaea. Most DGRs found in viruses appear to be in their tail fibers. These tail fibers – signified in the cartoon by the blue virus’ downward pointing ‘arms’— allow the virus to attach to one cell type (red), but not the other (purple). DGRs mutate these ‘arms,’ giving the virus opportunities to switch to different prey, like the purple cell. (Courtesy of Blair Paul)
    A Natural Mechanism Can Turbocharge Viral Evolution
    A team has discovered that diversity generating retroelements (DGRs) are not only widespread, but also surprisingly active. In viruses, DGRs appear to generate diversity quickly, allowing these viruses to target new microbial prey.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    Photograph of a stream of diatoms beneath Arctic sea ice.
    Polar Phytoplankton Need Zinc to Cope with the Cold
    As part of a long-term collaboration with the JGI Algal Program, researchers studying function and activity of phytoplankton genes in polar waters have found that these algae rely on dissolved zinc to photosynthesize.

    More

    This data image shows the monthly average sea surface temperature for May 2015. Between 2013 and 2016, a large mass of unusually warm ocean water--nicknamed the blob--dominated the North Pacific, indicated here by red, pink, and yellow colors signifying temperatures as much as three degrees Celsius (five degrees Fahrenheit) higher than average. Data are from the NASA Multi-scale Ultra-high Resolution Sea Surface Temperature (MUR SST) Analysis product. (Courtesy NASA Physical Oceanography Distributed Active Archive Center)
    When “The Blob” Made It Hotter Under the Water
    Researchers tracked the impact of a large-scale heatwave event in the ocean known as “The Blob” as part of an approved proposal through the Community Science Program.

    More

    A plantation of poplar trees. (David Gilbert)
    Genome Insider podcast: THE Bioenergy Tree
    The US Department of Energy’s favorite tree is poplar. In this episode, hear from ORNL scientists who have uncovered remarkable genetic secrets that bring us closer to making poplar an economical and sustainable source of energy and materials.

    More

  • Data & Tools
    • IMG
    • Data Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    HPCwire Editor's Choice Award (logo crop) for Best Use of HPC in the Life Sciences
    JGI Part of Berkeley Lab Team Awarded Best Use of HPC in Life Sciences
    The HPCwire Editors Choice Award for Best Use of HPC in Life Sciences went to the Berkeley Lab team comprised of JGI and ExaBiome Project team, supported by the DOE Exascale Computing Project for MetaHipMer, an end-to-end genome assembler that supports “an unprecedented assembly of environmental microbiomes.”

    More

    With a common set of "baseline metadata," JGI users can more easily access public data sets. (Steve Wilson)
    A User-Centered Approach to Accessing JGI Data
    Reflecting a structural shift in data access, the JGI Data Portal offers a way for users to more easily access public data sets through a common set of metadata.

    More

    Phytozome portal collage
    A More Intuitive Phytozome Interface
    Phytozome v13 now hosts upwards of 250 plant genomes and provides users with the genome browsers, gene pages, search, BLAST and BioMart data warehouse interfaces they have come to rely on, with a more intuitive interface.

    More

  • User Programs
    • Calls for Proposals
    • Special Initiatives & Programs
    • Product Offerings
    • User Support
    • Policies
    • Submit a Proposal
    screencap from Amundson and Wilkins subsurface microbiome video
    Digging into Microbial Ecosystems Deep Underground
    JGI users and microbiome researchers at Colorado State University have many questions about the microbial communities deep underground, including the role viral infection may play in other natural ecosystems.

    Read more

    Yeast strains engineered for the biochemical conversion of glucose to value-added products are limited in chemical output due to growth and viability constraints. Cell extracts provide an alternative format for chemical synthesis in the absence of cell growth by isolating the soluble components of lysed cells. By separating the production of enzymes (during growth) and the biochemical production process (in cell-free reactions), this framework enables biosynthesis of diverse chemical products at volumetric productivities greater than the source strains. (Blake Rasor)
    Boosting Small Molecule Production in Super “Soup”
    Researchers supported through the Emerging Technologies Opportunity Program describe a two-pronged approach that starts with engineered yeast cells but then moves out of the cell structure into a cell-free system.

    More

    These bright green spots are fluorescently labelled bacteria from soil collected from the surface of plant roots. For reference, the scale bar at bottom right is 10 micrometers long. (Rhona Stuart)
    A Powerful Technique to Study Microbes, Now Easier
    In JGI's Genome Insider podcast: LLNL biologist Jennifer Pett-Ridge collaborated with JGI scientists through the Emerging Technologies Opportunity Program to semi-automate experiments that measure microbial activity in soil.

    More

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    A view of the mangroves from which the giant bacteria were sampled in Guadeloupe. (Hugo Bret)
    Giant Bacteria Found in Guadeloupe Mangroves Challenge Traditional Concepts
    Harnessing JGI and Berkeley Lab resources, researchers characterized a giant - 5,000 times bigger than most bacteria - filamentous bacterium discovered in the Caribbean mangroves.

    More

    In their approved proposal, Frederick Colwell of Oregon State University and colleagues are interested in the microbial communities that live on Alaska’s glacially dominated Copper River Delta. They’re looking at how the microbes in these high latitude wetlands, such as the Copper River Delta wetland pond shown here, cycle carbon. (Courtesy of Rick Colwell)
    Monitoring Inter-Organism Interactions Within Ecosystems
    Many of the proposals approved through JGI's annual Community Science Program call focus on harnessing genomics to developing sustainable resources for biofuels and bioproducts.

    More

    Coloring the water, the algae Phaeocystis blooms off the side of the sampling vessel, Polarstern, in the temperate region of the North Atlantic. (Katrin Schmidt)
    Climate Change Threatens Base of Polar Oceans’ Bountiful Food Webs
    As warm-adapted microbes edge polewards, they’d oust resident tiny algae. It's a trend that threatens to destabilize the delicate marine food web and change the oceans as we know them.

    More

News & Publications
Home › News Releases › DOE User Facilities Join Forces to Tackle Biology’s Big Data

July 25, 2017

DOE User Facilities Join Forces to Tackle Biology’s Big Data

Inaugural Collaborative Science Call Yields Six Proposals Melding Genomics, Supercomputing

Users can look for patterns across data sets in the DOE JGI’s Integrated Microbial Genomes and Microbiomes (IMG/M) database with the help of NERSC’s supercomputer Cori. (Roy Kaltschmidt, Berkeley Lab)

Users can look for patterns across data sets in the DOE JGI’s Integrated Microbial Genomes and Microbiomes (IMG/M) database with the help of NERSC’s supercomputer Cori. (Roy Kaltschmidt, Berkeley Lab)

Six proposals have been selected to participate in a new partnership between two U.S. Department of Energy (DOE) user facilities through the “Facilities Integrating Collaborations for User Science” (FICUS) initiative. The expertise and capabilities available at the DOE Joint Genome Institute (JGI) and the National Energy Research Scientific Computing Center (NERSC) – both at the Lawrence Berkeley National Laboratory (Berkeley Lab) – will help researchers explore the wealth of genomic and metagenomic data generated worldwide through access to supercomputing resources and computational science experts to accelerate discoveries.

“As we bring researchers into the FICUS program, we are introducing a new user community to the power of supercomputers. Scientists will use whatever tools are readily available to investigate a hypothesis and to date, only a small set of biological tools have needed a supercomputer, but this is changing quickly,” says Kjiersten Fagnan, who serves a dual role as the DOE JGI’s Chief Informatics Officer and NERSC’s Data Science Engagement Group Lead.

The JGI-NERSC FICUS call is the latest partnership since the collaborative science initiative was formed in 2014 by the Office of Biological and Environmental Research (BER) to harness the combined expertise and resources of two of the national user facilities stewarded by the DOE Office of Science in support of DOE’s energy, environment, and basic research missions. NERSC is now the latest DOE User Facility to participate in FICUS, with prospects growing for the inclusion of others in the future.

A sequence similarity network of a family of enzymes from the nitroreductase superfamily (some nitroreductases can reduce TNT, a significant soil contaminant). Nodes represent enzyme sequences, while edges represent pairwise similarities more significant than 1e-42 (BLAST E-value). Red and blue nodes represent enzymes found in public sequence databases and belong to two sub-families, and white nodes represent sequences found only in the JGI’s metagenomic database (IMG/M). Large nodes represent experimentally-characterized enzymes of diverse functions. Notably, a significant expansion of the sequence space is observed (from 300 enzymes to >10,000), revealing a new potential group of enzymes found only in the IMG/M. “Linkers” that are also unique to metagenomes, display sequence similarity to experimentally-characterized enzymes of diverse functions, ND serve as attractive targets for synthesis and biochemical assays for intermediate function. (Eyal Akiva and Patsy Babbitt)

A sequence similarity network of a family of enzymes from the nitroreductase superfamily (some nitroreductases can reduce TNT, a significant soil contaminant). Nodes represent enzyme sequences, while edges represent pairwise similarities more significant than 1e-42 (BLAST E-value). Red and blue nodes represent enzymes found in public sequence databases and belong to two sub-families, and white nodes represent sequences found only in the DOE JGI’s metagenomic database (IMG/M). Large nodes represent experimentally-characterized enzymes of diverse functions. Notably, a significant expansion of the sequence space is observed (from 300 enzymes to >10,000), revealing a new potential group of enzymes found only in the IMG/M. “Linkers” that are also unique to metagenomes, display sequence similarity to experimentally-characterized enzymes of diverse functions, ND serve as attractive targets for synthesis and biochemical assays for intermediate function. (Eyal Akiva and Patsy Babbitt)

Through the JGI-NERSC FICUS call, users can query across all available data to look for patterns across data sets in the DOE JGI’s Integrated Microbial Genomes and Microbiomes (IMG/M) database with the help of NERSC’s supercomputer Cori, resulting in a more powerful analysis with increased capacity for novel discoveries. As many of these researchers are new to computing, a member of NERSC’s Data Science Engagement Team will be assigned to work with each FICUS project and DOE JGI staff will assess their needs and help them develop tools and workflows. Ultimately, these tools and scientific findings will be made publicly available via a NERSC science gateway.

From samples to protein structures and complexes. Center: Researchers gathering samples from Great Boiling Spring in Nevada (Image of Great Boiling Spring by Brian Hedlund, UNLV.). Left:  A cartoon of aligned metagenomic sequences. Each row is a different sequence. Each position is compared to all other positions to detect patterns of co-evolution, to predict contacts (yellow line). Top: A contact between two amino acids. Right: Contacts within a protein. Bottom: Contacts between proteins. (Protein structure and composite image by Sergey Ovchinnikov, UW)

From samples to protein structures and complexes. Center: Researchers gathering samples from Great Boiling Spring in Nevada (Image of Great Boiling Spring by Brian Hedlund, UNLV.). Left: A cartoon of aligned metagenomic sequences. Each row is a different sequence. Each position is compared to all other positions to detect patterns of co-evolution, to predict contacts (yellow line). Top: A contact between two amino acids. Right: Contacts within a protein. Bottom: Contacts between proteins. (Protein structure and composite image by Sergey Ovchinnikov, UW)

The accepted proposals include:

  • Patricia (Patsy) Babbitt of the University of California (UC), San Francisco aims to develop tools to mine the IMG/M database for enzyme superfamilies—functionally diverse collections of enzymes that share a common ancestor and fold, as well as active site architectures and reaction mechanism or other chemical capability. By profiling certain enzyme families from different environments, it may be possible to identify associations that will suggest functions for otherwise unknown proteins. As test cases, the Babbitt group is working with enzyme superfamilies involved in the biodegradation of insecticides, heavy metals and explosives.
  • David Baker at the University of Washington will access the metagenomics and metatranscriptomic data sets available in the IMG/M database to expand the structural universe of eukaryotic proteins. By mining the raw and annotated genome sequences, the team hopes to find more homologs within protein families that can then be used to develop computational methods that can build accurate models of how the proteins fold, providing testable clues to potential functions. The proposal builds upon a previous collaboration in which Baker’s lab utilized the sequence data in the IMG database to determine accurate 3D models of structures for 614 protein families (12 percent of which had not yet been structurally characterized).
  • Phillip Brooks of UC Davis proposes to speed up comparative genome sequence analysis by first calculating “signatures” of more than 5,000 private microbial genomes and then tackling all of the metagenomes in the IMG/M database using a technique called MinHash. The indexes would be a step toward developing technologies that could lead to faster and more accurate taxonomic organization of genomes contained in metagenomes, enabling more informative comparative analyses of metagenomics datasets.
  • Ed DeLong of the University of Hawaii at Manoa aims to develop a global catalog of microbial small RNAs (sRNAs)—highly structured, non-coding RNA molecules, 50 to 500 bases in length—from the publicly available metatranscriptomes and metagenomes in IMG/M, as well as data sets generated from a two-year time-series study by his own lab. Microbial sRNAs function as regulators of metabolic processes, and many are currently known to be involved in environmentally significant processes.
  • Steve Hallam of Canada’s University of British Columbia aims to reconstruct modular pathways mediating core biogeochemical cycles such as carbon, nitrogen, sulfur and iron. His team has already been able to map out a subset of phylogenetic reference trees for carbon and nitrogen metabolic pathways on their own, but they want to develop a scalable process for charting global biogeochemical cycles using fast phylogenetic mapping of functional anchor genes from the publicly available metatranscriptomes and metagenomes in IMG/M. This work will provide a community-driven framework in which to reconstruct the interconnected network of microbial mediated biogeochemical cycles with quantitative taxonomic resolution and inform modeling efforts to predict microbial community responses to environmental perturbation.
  • Kostas Konstantinidis of Georgia Institute of Technology wants to develop new approaches to analyze soil microbial communities at the individual population level. The team will start with sequencing, assembling and binning genome populations from permafrost soil metagenomes from samples collected at the Carbon in Permafrost Experimental Heating Research (CiPEHR) site near Alaska’s Denali National Park and then compare the population data they find with metagenomes in the IMG/M database. They hope to assess which populations are widespread within specific soil ecosystems and establish them as model organisms for studying carbon cycling within the corresponding ecosystems, as well as what gene functions are differentially abundant and thus selected by different ecosystems.
Alaska (tundra) site studied with schematic below of the metagenomic pipeline to identify sequence-discrete populations. (Adapted from Rodriguez-R and Konstantinidis, Microbe Magazine, 2014 and provided by Kostas Konstantinidis)

Alaska (tundra) site studied with schematic below of the metagenomic pipeline to identify sequence-discrete populations. (Adapted from Rodriguez-R and Konstantinidis, Microbe Magazine, 2014 and provided by Kostas Konstantinidis)

According to Fagnan, the diversity of NERSC’s science workload, which ranges from cosmology to nanoscience, makes this facility an ideal partner for this FICUS initiative. Additionally, NERSC has a long history of collaborating with the DOE JGI, so the staff is very familiar with the DOE JGI’s data and computational needs. Through a memorandum of understanding revised in 2011, NERSC has been providing high-performance and high-throughput computing support for the DOE JGI, and currently stores all of the Institute’s data. The two facilities have also worked closely over the years to get JGI’s sequence data processing and integration pipelines—as well as data processing tasks, such as sequencing quality control for base calling, detection of contamination, sequence alignment and assembly gene prediction—to run efficiently on the center’s supercomputers. Most recently, the DOE JGI has been able to dramatically improve the performance of metagenome assembly by leveraging Cori’s burst buffer resource.

“I really believe that the future of computing is going to be dominated by biology. The volumes of biological data that need to be synthesized, aggregated and interrogated will require supercomputers,” said Fagnan. “If you look at the data sets being generated and the questions that people have, you can see that researchers are going to have to combine different datasets—like genomics, metabolomics, protein crystal structures and potentially even brain scans and more—to find answers. This work cannot be done on a laptop or small cluster.”

The full list of approved projects is available at http://jgi.doe.gov/our-projects/csp-plans/fy-2017-csp-plans/#jgi-nersc.

* * *

ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab’s Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to print (Opens in new window)

The U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. JGI provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. Follow @jgi on Twitter.

DOE’s Office of Science is the largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Filed Under: News Releases

More topics:

  • COVID-19 Status
  • News
  • Science Highlights
  • Blog
  • Webinars
  • CSP Plans
  • Featured Profiles

Related Content:

JGI Contributes Nine to 2022 Highly Cited Researchers List

Nine headshots, one for each researcher, laid out beside a purple ribbon reading, "Home to Highly Cited Researchers 2022 Clarivate"

JGI announces first round of 2023 New Investigator awardees

Digital ID card with 10 headshots reads: Congratulations to our 2023 New Investigator recipients!

JGI at 25: Following Fungi that Pry Apart Plant Polymers

A brown goat with white horns looks at green hay

Exploring Possibilities: 2022 JGI-UC Merced Interns

2022 JGI-UC Merced interns (Thor Swift/Berkeley Lab)

JGI at 25: Using team science to build communities around data

JGI at 25: Expanding Metagenomics to Capture Viral Diversity

Artist rendering of genome standards being applied to deciphering the extensive diversity of viruses. (Illustration by Leah Pantea)
  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Policies
  • Emergency Info
  • Accessibility / Section 508 Statement
  • Flickr
  • LinkedIn
  • RSS
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2023 The Regents of the University of California