DOE Joint Genome Institute

  • COVID-19
  • About Us
  • Contact Us
  • Our Science
    • DOE Mission Areas
    • Bioenergy Research Centers
    • Science Programs
    • Science Highlights
    • Scientists
    Data yielded from RIViT-seq increased the number of sigma factor-gene pairs confirmed in Streptomyces coelicolor from 209 to 399. Here, grey arrows denote previously known regulation and red arrows are regulation identified by RIViT-seq; orange nodes mark sigma factors while gray nodes mark other genes. (Otani, H., Mouncey, N.J. Nat Commun 13, 3502 (2022). https://doi.org/10.1038/s41467-022-31191-w)
    Streamlining Regulon Identification in Bacteria
    Regulons are a group of genes that can be turned on or off by the same regulatory protein. RIViT-seq technology could speed up associating transcription factors with their target genes.

    More

    (PXFuel)
    Designer DNA: JGI Helps Users Blaze New Biosynthetic Pathways
    In a special issue of the journal Synthetic Biology, JGI scientific users share how they’ve worked with the JGI DNA Synthesis Science Program and what they’ve discovered through their collaborations.

    More

    A genetic element that generates targeted mutations, called diversity-generating retroelements (DGRs), are found in viruses, as well as bacteria and archaea. Most DGRs found in viruses appear to be in their tail fibers. These tail fibers – signified in the cartoon by the blue virus’ downward pointing ‘arms’— allow the virus to attach to one cell type (red), but not the other (purple). DGRs mutate these ‘arms,’ giving the virus opportunities to switch to different prey, like the purple cell. (Courtesy of Blair Paul)
    A Natural Mechanism Can Turbocharge Viral Evolution
    A team has discovered that diversity generating retroelements (DGRs) are not only widespread, but also surprisingly active. In viruses, DGRs appear to generate diversity quickly, allowing these viruses to target new microbial prey.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    Photograph of a stream of diatoms beneath Arctic sea ice.
    Polar Phytoplankton Need Zinc to Cope with the Cold
    As part of a long-term collaboration with the JGI Algal Program, researchers studying function and activity of phytoplankton genes in polar waters have found that these algae rely on dissolved zinc to photosynthesize.

    More

    This data image shows the monthly average sea surface temperature for May 2015. Between 2013 and 2016, a large mass of unusually warm ocean water--nicknamed the blob--dominated the North Pacific, indicated here by red, pink, and yellow colors signifying temperatures as much as three degrees Celsius (five degrees Fahrenheit) higher than average. Data are from the NASA Multi-scale Ultra-high Resolution Sea Surface Temperature (MUR SST) Analysis product. (Courtesy NASA Physical Oceanography Distributed Active Archive Center)
    When “The Blob” Made It Hotter Under the Water
    Researchers tracked the impact of a large-scale heatwave event in the ocean known as “The Blob” as part of an approved proposal through the Community Science Program.

    More

    A plantation of poplar trees. (David Gilbert)
    Genome Insider podcast: THE Bioenergy Tree
    The US Department of Energy’s favorite tree is poplar. In this episode, hear from ORNL scientists who have uncovered remarkable genetic secrets that bring us closer to making poplar an economical and sustainable source of energy and materials.

    More

  • Data & Tools
    • IMG
    • Data Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    HPCwire Editor's Choice Award (logo crop) for Best Use of HPC in the Life Sciences
    JGI Part of Berkeley Lab Team Awarded Best Use of HPC in Life Sciences
    The HPCwire Editors Choice Award for Best Use of HPC in Life Sciences went to the Berkeley Lab team comprised of JGI and ExaBiome Project team, supported by the DOE Exascale Computing Project for MetaHipMer, an end-to-end genome assembler that supports “an unprecedented assembly of environmental microbiomes.”

    More

    With a common set of "baseline metadata," JGI users can more easily access public data sets. (Steve Wilson)
    A User-Centered Approach to Accessing JGI Data
    Reflecting a structural shift in data access, the JGI Data Portal offers a way for users to more easily access public data sets through a common set of metadata.

    More

    Phytozome portal collage
    A More Intuitive Phytozome Interface
    Phytozome v13 now hosts upwards of 250 plant genomes and provides users with the genome browsers, gene pages, search, BLAST and BioMart data warehouse interfaces they have come to rely on, with a more intuitive interface.

    More

  • User Programs
    • Calls for Proposals
    • Special Initiatives & Programs
    • Product Offerings
    • User Support
    • Policies
    • Submit a Proposal
    screencap from Amundson and Wilkins subsurface microbiome video
    Digging into Microbial Ecosystems Deep Underground
    JGI users and microbiome researchers at Colorado State University have many questions about the microbial communities deep underground, including the role viral infection may play in other natural ecosystems.

    Read more

    Yeast strains engineered for the biochemical conversion of glucose to value-added products are limited in chemical output due to growth and viability constraints. Cell extracts provide an alternative format for chemical synthesis in the absence of cell growth by isolating the soluble components of lysed cells. By separating the production of enzymes (during growth) and the biochemical production process (in cell-free reactions), this framework enables biosynthesis of diverse chemical products at volumetric productivities greater than the source strains. (Blake Rasor)
    Boosting Small Molecule Production in Super “Soup”
    Researchers supported through the Emerging Technologies Opportunity Program describe a two-pronged approach that starts with engineered yeast cells but then moves out of the cell structure into a cell-free system.

    More

    These bright green spots are fluorescently labelled bacteria from soil collected from the surface of plant roots. For reference, the scale bar at bottom right is 10 micrometers long. (Rhona Stuart)
    A Powerful Technique to Study Microbes, Now Easier
    In JGI's Genome Insider podcast: LLNL biologist Jennifer Pett-Ridge collaborated with JGI scientists through the Emerging Technologies Opportunity Program to semi-automate experiments that measure microbial activity in soil.

    More

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    A view of the mangroves from which the giant bacteria were sampled in Guadeloupe. (Hugo Bret)
    Giant Bacteria Found in Guadeloupe Mangroves Challenge Traditional Concepts
    Harnessing JGI and Berkeley Lab resources, researchers characterized a giant - 5,000 times bigger than most bacteria - filamentous bacterium discovered in the Caribbean mangroves.

    More

    In their approved proposal, Frederick Colwell of Oregon State University and colleagues are interested in the microbial communities that live on Alaska’s glacially dominated Copper River Delta. They’re looking at how the microbes in these high latitude wetlands, such as the Copper River Delta wetland pond shown here, cycle carbon. (Courtesy of Rick Colwell)
    Monitoring Inter-Organism Interactions Within Ecosystems
    Many of the proposals approved through JGI's annual Community Science Program call focus on harnessing genomics to developing sustainable resources for biofuels and bioproducts.

    More

    Coloring the water, the algae Phaeocystis blooms off the side of the sampling vessel, Polarstern, in the temperate region of the North Atlantic. (Katrin Schmidt)
    Climate Change Threatens Base of Polar Oceans’ Bountiful Food Webs
    As warm-adapted microbes edge polewards, they’d oust resident tiny algae. It's a trend that threatens to destabilize the delicate marine food web and change the oceans as we know them.

    More

Our Science
Home › Our Science › Science Programs › Metabolomics Program › Metabolomics Data Analysis – Tips From Users

Metabolomics Data Analysis – Tips From Users

Many of the standard procedures for processing ‘omics data sets for gene expression, protein abundance, ribosomal similarity, etc can be applied to metabolomics data as well.  However, metabolites are unique in that they are the products of metabolism; where the other techniques lay the foundation for metabolism to occur. Example analysis approaches by JGI-metabolomics user’s are described below.  These examples are not meant to provide in depth teaching, but a starting point for how one might approach their own analysis.

Daniel Caddell

Daniel Caddell is a Research Biologist at US Department of Agriculture (USDA) Agricultural Research Service (ARS).  A useful first step in analyzing metabolomics data is to assess global trends in the data, beginning with assessing the robustness of sample replicates. For this, a scatterplot (log scale) can quickly be generated (in a spreadsheet program such as Microsoft Excel, or a programming language such as R) to compare trends in ion abundances between sample replicates. If the quality of the samples is high, very few significantly different ion abundances should be observed between replicates (Figure DC1A). In addition to determining the robustness of sample replicates, this method can be applied to probing relative peak heights of individual metabolites for outliers, whose ion abundances differ between sample type, location, or treatment, as seen in Figure DC1B-C. However, if the quantification of individual metabolites has not been performed, these relative ion abundances are not suitable for absolute metabolite level quantification (e.g. micrograms per gram of sample) or comparisons between different metabolites, due to differences in ionization efficiencies and the influence of the biological matrix.

Figure 1. Comparison of ion abundances between (A) replicates and (B) sample types. Each dot represents an individual metabolite present in the dataset, with red or blue filled dots indicating the metabolites that were more abundant in one dataset or the other (fold change > 2). (C) Single metabolites can also be analyzed before or after normalization.

Figure 1. Comparison of ion abundances between (A) replicates and (B) sample types. Each dot represents an individual metabolite present in the dataset, with red or blue filled dots indicating the metabolites that were more abundant in one dataset or the other (fold change > 2). (C) Single metabolites can also be analyzed before or after normalization.

Notably, normalization of the data to account for background signals present in extraction blanks can be accomplished by two different methods. First, the background signal present in extraction blanks can simply be subtracted from the corresponding ion abundance in the experimental samples. Alternatively, a value representing the lower detection limit in the dataset (e.g. ~4,000 in Figure DC1) can replace any empty data points, either in extraction blanks or experimental samples, prior to normalization. The rationale for this substitution is that metabolites absent from a sample cannot be distinguished from metabolites present below the detection threshold. After normalization, metabolite peak height can be converted to percent relative abundance by setting the maximum peak height observed across all samples to 100%.  While searching for the metabolites whose ion abundances have large fold changes is a useful heuristic for analysing metabolomic data, it can be beneficial to further subset metabolites by a combination of heuristic thresholds including significance (ie: P<0.05), fold change (ie: 2 or more), and minimum intensity (ie: 10x the background).

Ryan Lenz

Ryan LenzPathway analysis with MetaboAnalyst (Ryan Lenz).  MetaboAnalyst is a useful online interface that allows a researcher to conduct many different types of analysis (Xia et al. 2015). This program is written in the R coding language allowing advanced users to change statistical and imaging parameters if desired. Generally, the online interface is sufficient for most analysis. To start, it is best to normalize the data before doing comparative statistics such as t-tests and fold change. MetaboAnalyst also has many choices for unsupervised and supervised modeling of metabolomic data and significant feature selection.

 

Figure 2. Overall data differentiation between mock-inoculated and inoculated stem tissue. (A) Principal component analysis (PCA) and (B) and Heatmap visualization of all (~250) metabolic features.

Figure 2. Overall data differentiation between mock-inoculated and inoculated stem tissue. (A) Principal component analysis (PCA) and (B) and Heatmap visualization of all (~250) metabolic features.

Figure 2 shows a principal component analysis and a heatmap to summarize the data. From here you can organize the fold-change table of all the metabolites and run it through enrichment and pathway analysis.  This allows you to get a feel for the metabolomic reactions most represented by the data. Once your data is uploaded, you can choose a pathway library from an assortment of model species including mammals, plants, and microbes. Figure 3 is an example of how MetaboAnalyst can organize the most impacted metabolic pathways from your data.

Figure 3. Metabolic pathways altered by inoculated stems organized by pathway enrichment analysis (p-values) and pathway topology analysis (pathway impact).

Figure 3. Metabolic pathways altered by inoculated stems organized by pathway enrichment
analysis (p-values) and pathway topology analysis (pathway impact).

MAGI (https://magi.nersc.gov)  is another tool that can add a layer of biological relevance to metabolomic data. Generally, MAGI allow users to screen an organism’s genome for biochemical pathways involving a list of metabolites identified from metabolomics studies. In this way, users can confirm that significant features are produced/sourced from their treatments. It can also help decipher the origin of identified metabolites in treatments involving more than one organism. For example, a significant metabolite from an experiment involving both a plant and a fungal pathogen was originally listed as putatively identified via LC-MS. This metabolite was screened with MAGI for both the plant and the fungus. The metabolite received a very low MAGI score for both organisms which indicates that it most likely is not produced in that context and is likely mis-identified. As a result, a user can reevaluate the m/z and retention times and select a biologically relevant metabolite for further analysis.

Candice Swift

Candice Swift is a graduate student in the O’Malley lab at UC Santa Barbara.

Candice Swift is a graduate student in the O’Malley lab at UC Santa Barbara.

Molecular Networking (Candice Swift).  Global Natural Products Social Molecular Networking (Ming et al. Nature Biotechnology 2016) GNPS) [Ming et al. Nature Biotechnology 2016] is a powerful technique for visualizing metabolomics datasets. In a molecular network, each node represents an MS/MS spectra for a particular m/z, retention time pair. Spectra are compared and given a cosine score between zero and one: a score of zero represents spectra without any similarity and a score of one represents a complete match. Similar nodes are connected by edges (the default threshold is 0.7), resulting in a network of clustered spectra. Mass differences between nodes can be used to gain structural insights into functional groups that may be present in the parent ions (for an example, see Fig. 2B of Watrous et al.PNAS 2012).

The GNPS data analysis pipeline used to create molecular networksThe GNPS data analysis pipeline used to create molecular networks has several useful features: 1) users can match unknown spectra to the GNPS compilation of spectral libraries, 2) it includes a built-in network visualization browser that allows visualization of clusters and comparison of experimental spectra to the spectra of known compounds in the libraries, and 3) comparison of up to six different experimental conditions. This list is far from comprehensive, with more improvements and features constantly being added. Users are encouraged to explore GNPS for themselves. For more stringent library matching, be sure to adjust the mass difference tolerance, called Maximum Analog Search Mass Difference (default is 100 ppm).

GNPS compilation of spectral libraries

Getting started is fairly straightforward, with video tutorials, in-depth documentation, and even regular office hours (see the main website here).  Parameters to consider adjusting when creating a network include the following (default given in parenthesis): Min Pairs Cos (0.7), Maximum Connected Component Size (100), Minimum Cluster Size (2), and Maximum Analog Search Mass Difference (100.0)

When using GNPS, please cite Wang, Mingxun, et al. “Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.” Nature Biotechnology 34.8 (2016): 828-837. PMID: 27504778

Marc Chevrette

Marc ChevrettePhylogeny and Metabolic similarity (Marc Chevrette).  Metabolism is a complex trait shaped by ecological and evolutionary forces. As such, organismal metabolism can be explored in a phylogenetic framework to help explain underlying environmental (e.g. nutrient acquisition, flux) and species-species (e.g. host-microbe metabolic exchange, secondary metabolism) interactions. Gene-metabolite relationships (see MAGI section above) in the context of phylogenies offer insight into the evolutionary histories of pathways and allow for comparisons across gene topologies, population structure, and ecology.

Chevrette MG, Carlos-Shanley C, Louie KB, Bowen BP, Northen TR and Currie CR (2019) Taxonomic and Metabolic Incongruence in the Ancient Genus Streptomyces. Front. Microbiol. 10:2170. doi: 10.3389/fmicb.2019.02170

 

  • Plant Program
  • Fungal & Algal Program
  • Metagenome Program
  • Microbial Program
  • DNA Synthesis Science Program
  • Metabolomics Program
    • Metabolite Analyses
    • Metabolite Standards in JGI Library
    • Metabolomics Results - Basic
    • Metabolomics Instrumentation
    • Sample Submission and Guidelines
    • Metabolomics Select Publications
    • Metabolomics Data Analysis - Tips From Users
  • Secondary Metabolites

More topics:

  • COVID-19 Status
  • News
  • Science Highlights
  • Blog
  • Webinars
  • CSP Plans
  • Featured Profiles
  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Policies
  • Emergency Info
  • Accessibility / Section 508 Statement
  • Flickr
  • LinkedIn
  • RSS
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2023 The Regents of the University of California