DOE Joint Genome Institute

  • COVID-19
  • About Us
  • Contact Us
  • Our Science
    • DOE Mission Areas
    • Science Programs
    • Science Highlights
    • Scientists
    A vertical tree stump outdoors with about a dozen shiitake mushrooms sprouting from its surface.
    Tracing the Evolution of Shiitake Mushrooms
    Understanding Lentinula genomes and their evolution could provide strategies for converting plant waste into sugars for biofuel production. Additionally, these fungi play a role in the global carbon cycle.

    More

    Soil Virus Offers Insight into Maintaining Microorganisms
    Through a collaborative effort, researchers have identified a protein in soil viruses that may promote soil health.

    More

    Data yielded from RIViT-seq increased the number of sigma factor-gene pairs confirmed in Streptomyces coelicolor from 209 to 399. Here, grey arrows denote previously known regulation and red arrows are regulation identified by RIViT-seq; orange nodes mark sigma factors while gray nodes mark other genes. (Otani, H., Mouncey, N.J. Nat Commun 13, 3502 (2022). https://doi.org/10.1038/s41467-022-31191-w)
    Streamlining Regulon Identification in Bacteria
    Regulons are a group of genes that can be turned on or off by the same regulatory protein. RIViT-seq technology could speed up associating transcription factors with their target genes.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    A panoramic view of a lake reflecting a granite mountain.
    Genome Insider: Methane Makers in Yosemite’s Lakes
    Meet researchers who sampled the microbial communities living in the mountaintop lakes of the Sierra Nevada mountains to see how climate change affects freshwater ecosystems, and how those ecosystems work.

    Listen

    A light green shrub with spiny leaves, up close.
    Genome Insider: A Shrubbier Version of Rubber
    Hear from the consortium working on understanding the guayule plant's genome, which could lead to an improved natural rubber plant.

    Listen

    The switchgrass diversity panel growing at the Kellogg Biological Station in Michigan. (David Lowry)
    Mapping Switchgrass Traits with Common Gardens
    The combination of field data and genetic information has allowed researchers to associate climate adaptations with switchgrass biology.

    More

  • Data & Tools
    • IMG
    • Data Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    iPHoP image (Simon Roux)
    iPHoP: A Matchmaker for Phages and their Hosts
    Building on existing virus-host prediction approaches, a new tool combines and evaluates multiple predictions to reliably match viruses with their archaea and bacteria hosts.

    More

    Abstract image of gold lights and squares against a black backdrop
    Silver Age of GOLD Introduces New Features
    The Genomes OnLine Database makes curated microbiome metadata that follows community standards freely available and enables large-scale comparative genomics analysis initiatives.

    More

    Graphical overview of the RNA Virus MetaTranscriptomes Project. (Courtesy of Simon Roux)
    A Better Way to Find RNA Virus Needles in the Proverbial Database Haystacks
    Researchers combed through more than 5,000 data sets of RNA sequences generated from diverse environmental samples around the world, resulting in a five-fold increase of RNA virus diversity.

    More

  • User Programs
    • Calls for Proposals
    • Special Initiatives & Programs
    • Product Offerings
    • User Support
    • Policies
    • Submit a Proposal
    Green plant matter grows from the top, with the area just beneath the surface also visible as soil, root systems and a fuzzy white substance surrounding them.
    Supercharging SIP in the Fungal Hyphosphere
    Applying high-throughput stable isotope probing to the study of a particular fungi, researchers identified novel interactions between bacteria and the fungi.

    More

    Digital ID card with six headshots reads: Congratulations to our 2022 Function Genomics recipients!
    Final Round of 2022 CSP Functional Genomics Awardees
    Meet the final six researchers whose proposals were selected for the 2022 Community Science Program Functional Genomics call.

    More

    croppe image of the JGI helix sculpture
    Tips for a Winning Community Science Program Proposal
    In the Genome Insider podcast, tips to successfully avail of the JGI's proposal calls, many through the Community Science Program.

    Listen

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    2022 JGI-UC Merced interns (Thor Swift/Berkeley Lab)
    Exploring Possibilities: 2022 JGI-UC Merced Interns
    The 2022 UC Merced intern cohort share how their summer internship experiences have influenced their careers in science.

    More

    image from gif that shows where in the globe JGI fungal collaborators are located.
    Using Team Science to Build Communities Around Data
    As the data portals grow and evolve, the research communities further expand around them. But with two projects, communities are forming to generate high quality genomes to benefit researchers.

    More

    Cow Rumen and the Early Days of Metagenomics
    Tracing a cow rumen dataset from the lab to material for a hands-on undergraduate research course at CSU-San Marcos that has since expanded into three other universities.

    More

Our Science
Home › Our Science › Science Programs › Fungal & Algal Program › Benchmarks

Benchmarks

Complex intron-exon structure of eukaryotic genes makes their prediction challenging. Quality of gene prediction in eukaryotic genomes can be improved by combining different gene prediction approaches (ab initio, based on homology, ESTs, synteny, or their combinations) and experimental data (transcriptomics, proteomics, etc). In the course of fungal genome annotations we compared different gene predictors and annotation pipelines to assess and refine our annotation strategies for future genomes. Results of two such tests are presented here:

1. Annotation of Heterobasidion annosum genome

Results: Several gene predictors and annotation pipelines were used in annotating the genome of fungus H. annosum v1.0 and accuracy of gene prediction was compared based on homology and EST support. Combination of tools used in the JGI annotation pipeline predicted larger sets of genes with best support.

EuGene
[1]
GeneMark
[2]
FgenesH
[3]
JGI Pipe
[4,5]
Number of predicted gene models 11,547 9,609 8,409 12,270
with partial EST support 5,544 3,829 4,567 5,248
with full length EST support 2,538 1,182 2,896 3,073
with homology support 6,758 6,043 5,750 7,214
with strong homology support (>80% aa identity, >80% coverage) 112 109 174 187
with homology and EST support 2,894 2,172 2,720 2,953
Average EST coverage per gene 77.7% 68.2% 80.8% 79.1%
Supported splice sites 41,581 40,808 45,498 47,671
Average homology coverage per gene 64% 60% 68% 69%

EuGene models were built and provided by a collaborator. All models were used in JGI pipeline. EST support was computed based on 40,807 ESTs and 10,126 EST cluster consensus sequences mapped by BLAT; protein homology was computed by blast against NCBI NR.
Reference

2. Comparison of MAKER and JGI Annotation pipeline

Results: Publicly available annotation pipeline MAKER[6] was compared with JGI annotation pipeline [4,5]. For Basidiomycete Dichomitus squalens , JGI pipeline predicted more genes with better support using several lines of evidence.

MAKER
[6]
JGI Annotation pipeline
[4,5]
Number of predicted gene models 9,940 12,290
with Swissprot hits 6,521 7,356
with non-repeat PFAM domains 5,365 6,010
with EST support 9,252 10,796
with >90% EST support 7,729 9,178
Number of unique PFAM domains 2,207 2,245
Average EST coverage per gene 93.0% 93.3%
Splice sites supported by ESTs 99,627 102,200

Inputs: Aassembly v1.0 of D. squalens, 359,410 proteins seeds from NCBI NR, 16,501 EST cluster consensus sequences mapped by BLAT to the assembly. Mapper used the following gene predictors: Exonerate, FgenesH (same parameters as in JGI pipeline) and Augustus. All genes were blasted against the same Swissprot set of 530,264 protein sequences (downloaded Jul5 2011), EST sequences, and PFAM database(Pfam_v21)

3. Comparative Analysis Methods and Tools

Motivation

Genome annotation and analysis requires development and validation of new algorithms and tools. Several directions of this development include methods to analyze eukaryotic genome organization (tandem and segmental duplication, gene-based synteny, including for multiple related genomes), gene structure (intron conservation or loss across genomes), gene gain/loss (detection of possible errors in automated clustering results for analysis of gene families, creating whole genome based phylogenetic trees based on clustering results, pfam domain analysis to detect expanded and lost families), genome evolution, gene expression, genome variation, metabolic pathways and regulatory elements. Test new gene predictors, including those using Rna-Seq data and synteny-based approaches on validated gene sets in terms of accuracy and speed, pipelines (eg, MAKER), repeat finding software, and non-coding RNA finding software. This project aims at (1) developing algorithms and prototypes for new genome analysis methods for publications; (2) testing new gene prediction and genome analysis tools for possible integration into production annotation process.

Comparative Gene Modeling

Comparative gene modeling aimed to improve the initial gene predictions for a set of closely related organisms and correct for missing or incorrectly predicted genes (incorrect splice sites, chimeras, gene fragments, etc).The idea of comparative modeling is that for closely related genomes, most orthologs have the same conserved gene structure. The algorithm maps all gene models predicted in all genomes to all individual genomes, and for each locus selects among the potentially many competing models, the one which is most closely resemble the homologous genes from other genomes. This procedure maybe iterated several times until no change in gene models will be observed

Results

For Basidiomycete Dichomitus squalens reannotation using comparative modeling is compared with initial JGI production annotation:

JGI Annotation pipeline Comparative modeling
Number of predicted gene models 12,290 12,802
with Swissprot hits 7,356 7,900
with non-repeat PFAM domains 6,010 6,353
with EST support 10,796 11,105
with >90% EST support 9,178 9,444
Number of unique PFAM domains 2,245 2,322
Average EST coverage per gene 93.3% 93.3%
Splice sites supported by ESTs 102,200 104,246

 

Reference:

  1. Schiex T, Moisan A, Rouzé P. (2001) Computational Biology, selected papers from JOBIM’ 2000, no 2066 in LNCS. Springer Verlag; EuGène, an eukaryotic gene finder that combines several type of evidence; pp. 118–133.
  2. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. (2008) Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 18(12):1979-90.
  3. Solovyev V, Kosarev P, Seledsov I, Vorobyev D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1:S10.1-12.
  4. Grigoriev IV, Martinez DA, Salamov AA (2006) Fungal genomic annotation. In Applied Mycology and Biotechnology (Eds. Aurora, DK, Berka, RM, Singh, GB), Elsevier Press, Vol 6 (Bioinformatics), 123-142.
  5. http://genome.jgi.doe.gov/programs/fungi/FungalGenomeAnnotationSOP.pdf
  6. Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Sánchez Alvarado A, Yandell M. (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18(1):188-96.

 

  • Plant Program
  • Fungal & Algal Program
    • MycoCosm Fungal Portal
    • PhycoCosm Algal Portal
    • Genomic Encyclopedia of Fungi
    • 1000 fungal genomes
    • Benchmarks
    • Fungal & Algal Publications
  • Metagenome Program
  • Microbial Program
  • DNA Synthesis Science Program
  • Metabolomics Program
  • Secondary Metabolites
MycoCosm, the fungal genomics resource.

MycoCosm, the fungal genomics resource.

PhycoCosm, the algal genomics resource

PhycoCosm, the algal genomics resource.

  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Policies
  • Emergency Info
  • Accessibility / Section 508 Statement
  • Flickr
  • LinkedIn
  • RSS
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2023 The Regents of the University of California