Leveraging massive comparative genomics to decode millions of mystery genes and lay a future-ready foundation
A sweeping analysis of nearly 2,000 genomes reveals just how much we still have to discover about Earth's fungal kingdom. [Image: Steven Ahrendt]
Over the past billion years, fungi have spread quietly throughout our planet. From the deepest ocean floors to Antarctic valleys, these organisms have diversified into an estimated 2–6 million species — dwarfing the diversity of plants. To put this evolutionary timescale in perspective, the genus Aspergillus has been evolving for 82 million years, making it nearly 30 times older than the entire human lineage.
Yet despite their ubiquity and ancient origins, fungi remain largely mysterious. Only about 200,000 fungal species have been named. Among those sequenced, roughly half of all genes have completely unknown functions — like a vast library of books written in a yet-to-be-deciphered language.
Researchers with the Department of Energy (DOE) Joint Genome Institute (JGI)’s Fungal and Algal Science Program and JGI users have been at the forefront of tackling this enormous challenge, assembling one of the world's most comprehensive collections of fungal genomes. A sweeping analysis published in Nature Reviews Microbiology of nearly 2,000 of these genomes reveals just how much we still have to discover about Earth's fungal kingdom. The manuscript reviews about 200 publications linking evolutionary history and innovations across different classes of fungi.
“The enormous diversity of fungal genomes that has evolved over a billion years of evolution offers catalogs of enzymes, secondary metabolites, and other parts lists for biotechnology, bioenergy and biomaterials,” said Igor Grigoriev, head of the JGI’s Fungal and Algal Program and co-author of the paper. The JGI is a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory (Berkeley Lab). “These data can also be used for machine learning and artificial intelligence approaches for prediction of fungal lifestyles based on genome sequences.”
The Fungal Knowledge Gap
The fungal kingdom presents researchers with a sampling puzzle of extraordinary complexity. Beyond the sheer number of undiscovered species lies an even thornier problem: The genomes already sequenced don't adequately capture the vast nature of fungal diversity.
Examining databases like MycoCosm, the JGI’s online fungal genomics resource platform, reveals a stark imbalance. More recently evolved fungi within Dikarya, which include familiar mushrooms and yeasts, outnumber their ancient relatives by nearly 10 to one in sequenced collections. Yet the early-diverging fungi actually comprise most of the diversity within the fungal kingdom, meaning they also likely harbor the most evolutionary innovations. Some entire lineages have only a single genome representative, while others remain completely unsampled.
This creates a bottleneck: Without genetic blueprints from diverse fungi, researchers struggle to identify and classify new species from environmental samples. Meanwhile, even well-studied genomes contain mysteries — including unknown genes that are conserved across the entire fungal kingdom, suggesting they perform crucial but uncharacterized functions.
The challenge facing scientists is both logistical and analytical. How can the research community systematically sample an ancient, hyperdiverse kingdom while simultaneously decoding the biological meaning hidden within millions of genes?
Fostering a Deeper Understanding
Fortunately, fungi are ideal subjects for large-scale genomic analysis due to their compact genomes. While plant and animal genomes can sprawl across billions of DNA base pairs, most fungal genomes contain fewer than 50 million — small enough that thousands can be sequenced and compared using modern technologies.
The JGI recognized this opportunity early, developing standardized workflows to produce high-quality, comparable genome assemblies across diverse species. The MycoCosm platform has become a central hub for fungal genomics, and is already being used as a training ground for the next generation of scientists. Initiatives like our 1000 Fungal Genomes Project established crucial phylogenetic frameworks spanning the entire kingdom.
Stephen Mondo, a research scientist with the JGI and co-author of the paper, called it “a call to action for many communities” including educational partners, citizen scientists and the fungal scientific community as "potential routes to improve representation and understanding across the fungal kingdom.”
Through those efforts, researchers are able to leverage nearly 2,000 genomes to construct kingdom-wide evolutionary trees and overlay them with genomic features. This allows for revealing patterns that would be invisible when studying species in isolation. The comparative power has uncovered remarkable evolutionary stories: how ancient yeast lineages actually simplified over time by losing genes, how some fungi acquired entirely new capabilities through partnerships with bacterial symbionts, and how others revolutionized their biology by adopting genes from completely unrelated organisms.
The scale transforms individual genomes from static snapshots into dynamic stories of evolutionary innovation.
Building a Future-Ready Foundation
This massive collection of standardized fungal genomes positions the JGI to be a leader in applying artificial intelligence to genomics. The comparative datasets that revealed evolutionary patterns are now becoming training data for machine learning models that can predict biological capabilities directly from DNA sequences.
Other researchers have already demonstrated machine learning's potential in fungal genomics. Studies cited in the review show that machine learning techniques can successfully predict fungal lifestyles and ecological niches from genomic features alone. In one major project analyzing over 1,000 yeast genomes, machine learning identified specific genes associated with metabolic flexibility, distinguishing generalist species from specialists based on their genetic signatures.
The scale and quality of JGI data makes our user facility well-positioned to provide the comprehensive, standardized infrastructure that makes such discoveries possible. The JGI's standardized workflows ensure that genomes are comparable across thousands of species, while the MycoCosm platform centralizes this data for researchers worldwide. As the JGI scales toward 10,000 annotated genomes, this infrastructure advantage becomes even more critical. With roughly half of all fungal genes still lacking functional annotations, the challenge to applying AI is having comprehensive enough training data to make accurate predictions across the entire fungal kingdom.
With that predictive power comes transformative potential. The paper highlights how fungi produce "an incredible array of enzymes and secondary metabolites, many of which hold great economic value"— from pharmaceuticals to biofuels to agricultural applications.
In conclusion, Grigoriev suggests that, “In the next few years, with advances in genomics and multi-omics, phenotyping and AI we should better understand fungal biology. Moving toward 10,000 fungal genomes and beyond will yield novel enzymes and metabolites, enable prediction of fungal traits and capabilities, and help develop sustainable bioeconomy solutions.”
In the Nature Reviews Microbiology article, Grigoriev and Mondo envision, "a future wherein genome annotation is accompanied by hypotheses about fungal lifestyle and their roles in ecosystems.”
The JGI's comprehensive genomic infrastructure, combined with the power of artificial intelligence, promises to transform the discovery, understanding and ability to harness biological capabilities hidden within Earth's most ancient and diverse eukaryotic kingdom.
The U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory, is committed to advancing genomics in support of DOE Office of Science Biological and Environmental Research (BER) missions. The JGI provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges.
DOE’s Office of Science is the largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science/office-science.
Mycocosm
The Mycocosm web portal provides data access, visualization, and analysis tools for comparative genomics of fungi. Mycocosm enables users to navigate across sequenced fungal genomes, and to conduct comparative and genome-centric analyses of fungi and community annotation.
The Fungal & Algal Program explores the vast diversity of fungi and algae to unlock their potential for bioenergy, environmental sustainability, and bioproducts.
Our data platforms are a key resource for the broader scientific community and require constant developments to meet the ever-changing demands of our users.
The Mycological Curriculum for Education and Discovery is a Course-Based Undergraduate Research Experience (CURE) that consists of coordinated hands-on experiments across fungal biology teaching labs.