By leveraging new tools and large-scale data mining, JGI researchers nearly tripled the available fungal mitochondrial genome data, unlocking new potential for AI-driven studies of energy metabolism.

The Science

a close up of indigo mushrooms on grass
Lactarius indigo (pictured) is one of hundreds of fungal species whose mitochondrial genomes were annotated in a recent study published in Nucleic Acids Research. (Image credit: Brian P. Looney)

Researchers produced nearly 10,000 new annotations of fungal mitochondrial genomes — the DNA found inside mitochondria, the energy-producing structures within cells. This is the largest dataset of its kind ever assembled. The team identified 15 core genes shared across all fungi, and found that some of these genes have moved from the mitochondria into the nuclear genome over evolutionary time. They also discovered more than 6,000 mitochondrial sequences previously unrecognized within existing public databases and recovered more than 3,000 others from environmental datasets. All of the annotations and data  are publicly available through MycoCosm, the JGI’s web portal for fungal genome data.

 

The Impact

This dataset gives researchers a new foundation for studying how fungi produce and manage energy at the cellular level. Because the annotations are standardized and span the full kingdom, they are well-suited to train machine learning models that predict biological traits from DNA sequences. The study also confirmed that mitochondrial and nuclear evolution are closely aligned across fungi, validating mitochondrial genes as reliable markers for classifying species and studying diversity in complex biological systems. By making the tools and data available through MycoCosm — a platform widely used across the research community — the work lowers technical barriers for researchers who previously lacked access to high-quality mitochondrial annotations.

Summary

Mitochondrial genomes carry the genes responsible for oxidative phosphorylation, the central energy-producing process in complex cells. Despite their biological importance, these genomes have resisted large-scale automated annotation due to features like self-splicing introns and multiple genetic codes. Adding to the challenge, mitochondrial sequences are frequently embedded within nuclear genome assemblies in public databases, where they go unrecognized.

Researchers at the Department of Energy (DOE) Joint Genome Institute (JGI), a DOE Office of Science user facility at Berkeley Lab, addressed this gap in a recent article published by Nucleic Acids Research. The team developed and used an annotation workflow that combines three independent gene-prediction methods to produce more complete gene models. The JGI workflow consistently recovered all 15 core mitochondrial genes across the dataset, whereas a widely used automated annotation tool missed at least one core gene in nearly half of those same genomes.

The team first applied the workflow to a curated reference set of over 300 genomes spanning the fungal kingdom's full taxonomic breadth. Across this set, the same 15 core genes — 14 encoding energy-production proteins plus one ribosomal protein gene — were present in every genome examined. But while these core genes were the same across the kingdom, the genomes that carry them were strikingly different in both size and organization: Mitochondrial genome size and structure varied enormously, ranging from just 12 kilobases in Rozella allomycis, an early-diverging parasitic fungus to more than one-million base pairs in desert truffles — more than twice the size of the largest fungal mitochondrial genome reported before this study.

A statistical comparison of evolutionary trees built from mitochondrial and nuclear genomes showed strong agreement across the kingdom, confirming that mitochondrial genes reliably reflect the broader evolutionary relationships among fungal species. The team also found evidence that in certain lineages, genes have migrated from the mitochondrial genome into the nuclear genome over time — supporting the notion that the boundary between these two genomes is not fixed, but continues to shift through ongoing evolutionary processes.

The team then extended their analysis beyond the reference set, using two additional approaches. To search environmental data, they developed a two-stage metagenomic detection pipeline: First, they scanned over 26,000 publicly available metagenomes in the JGI's IMG portal for mitochondrial ribosomal RNA sequences. The next step applied metagenomic binning (grouping sequencing reads by their likely organism of origin) followed by phylogenomic analysis to confirm genuine fungal mitochondria. This pipeline recovered more than 3,000 fungal mitochondrial genomes from existing data without any new sequencing.

In a parallel effort to search existing genome assemblies, the team used hidden Markov models — statistical tools trained on known gene sequences — to scan over 9,000 publicly available fungal nuclear genome assemblies in the National Center for Biotechnology Information's GenBank. This led to identifying 6,467 of those assemblies, or roughly 70%, to contain mitochondrial sequences that had previously gone unrecognized.

In total, these efforts produced nearly 10,000 new annotations of fungal mitochondrial genomes, the largest dataset of its kind. The scale and standardized structure of the dataset position it as a resource for AI-driven approaches to predicting fungal metabolic traits from genomic data to understand how biological systems produce and manage energy.

By advancing the JGI’s strategic effort to generate 10,000 high-quality, AI-ready fungal genomes, this work directly supports the DOE’s Biological and Environmental Research (BER) mission to understand complex biological systems for energy and other applications.

All annotations, comparative tools, and visualizations are publicly available through the JGI's MycoCosm platform.


Contacts

​​BER Contact
Ramana Madupu, Ph.D
Program Manager
Biological Systems Sciences Division
Biological and Environmental Research Program
Office of Science
Department of Energy
[email protected]

JGI Contact
Steven Ahrendt
Data Scientist
Fungal & Algal Program
[email protected]

Back to Science Stories
More Details