DOE Joint Genome Institute

  • COVID-19
  • About
  • Phones
  • Contacts
  • Our Science
    • DOE Mission Areas
    • Bioenergy Research Centers
    • Science Programs
    • Products
    • Science Highlights
    • Scientists
    Screencap of green algae video for PNAS paper
    Green Algae Reveal One mRNA Encodes Many Proteins
    A team of researchers has found numerous examples of polycistronic expression – in which two or more genes are encoded on a single molecule of mRNA – in two species of green algae.

    Read more

    Advances in Rapidly Engineering Non-model Bacteria
    CRAGE is a technique for chassis (or strain)-independent recombinase-assisted genome engineering, allowing scientists to conduct genome-wide screens and explore biosynthetic pathways. Now, CRAGE is being applied to other synthetic biology problems.

    Read more

    Maize can produce a cocktail of antibiotics with a handful of enzymes. (Sam Fentress, CC BY-SA 2.0)
    How Maize Makes An Antibiotic Cocktail
    Zealexins are produced in every corn variety and protect maize by fending off fungal and microbial infections using surprisingly few enzymes.

    More

  • Our Projects
    • Search JGI Projects
    • DOE Metrics/Statistics
    • Approved User Proposals
    • Legacy Projects
    Poplar (Populus trichocarpa and P. deltoides) grow in the Advanced Plant Phenotyping Laboratory (APPL) at Oak Ridge National Laboratory in Tennessee. Poplar is an important biofuel feedstock, and Populus trichocarpa is the first tree species to have its genome sequenced — a feat accomplished by JGI. (Image courtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy)
    Podcast: Xiaohan Yang on A Plantiful Future
    Building off plant genomics collaborations between the JGI and Oak Ridge National Laboratory, Xiaohan Yang envisions customizing plants for the benefit of human society.

    More:

    Expansin complex with cell wall in background. (Courtesy of Daniel Cosgrove)
    Synthesizing Microbial Expansins with Unusual Activities
    Expansin proteins from diverse microbes have potential uses in deconstructing lignocellulosic biomass for conversion to renewable biofuels, nanocellulosic fibers, and commodity biochemicals.

    Read more

    High oleic pennycress. (Courtesy of Ratan Chopra)
    Pennycress – A Solution for Global Food Security, Renewable Energy and Ecosystem Benefits
    Pennycress (Thlaspi arvense) is under development as a winter annual oilseed bioenergy crop. It could produce up to 3 billion gallons of seed oil annually while reducing soil erosion and fertilizer runoff.

    Read more

  • Data & Tools
    • IMG
    • Genome Portal
    • MycoCosm
    • PhycoCosm
    • Phytozome
    • GOLD
    Artistic interpretation of CheckV assessing virus genome sequences from environmental samples. (Rendered by Zosia Rostomian​, Berkeley Lab)
    An Automated Tool for Assessing Virus Data Quality
    CheckV can be broadly utilized by the research community to gauge virus data quality and will help researchers to follow best practices and guidelines for providing the minimum amount of information for an uncultivated virus genome.

    More

    Unicellular algae in the Chlorella genus, magnified 1300x. (Andrei Savitsky)
    A One-Stop Shop for Analyzing Algal Genomes
    The PhycoCosm data portal is an interactive browser that allows algal scientists and enthusiasts to look deep into more than 100 algal genomes, compare them, and visualize supporting experimental data.

    More

    Artistic interpretation of how microbial genome sequences from the GEM catalog can help fill in gaps of knowledge about the microbes that play key roles in the Earth's microbiomes. (Rendered by Zosia Rostomian​, Berkeley Lab)
    Podcast: A Primer on Genome Mining
    In Natural Prodcast: the basics of genome mining, and how JGI researchers conducted it in IMG/ABC on thousands of metagenome-derived genomes for a Nature Biotechnology paper.

    Read more

  • User Programs
    • Calls for User Proposals
    • Special Initiatives & Programs
    • User Support
    • Submit a Proposal
    screencap long reads webinar_ Metagenome Program
    Utilizing long-read sequencing for metagenomics and DNA modification detection webinar
    Watch the webinar on how the JGI employs single-molecule, long-read DNA sequences to aid with genome assembly and transcriptome analysis of microbial, fungal, and plant research projects.

    More

    SIP engagement webinar
    “SIP technologies at EMSL and JGI” Webinar
    The concerted stable isotope-related tools and resources of the JGI and the Environmental Molecular Sciences Laboratory (EMSL) may be requested by applying for the annual “Facilities Integrating Collaborations for User Science” (FICUS) call.

    Read more

    martin-adams-unsplash
    CSP Functional Genomics Call Ongoing
    The CSP Functional Genomics call helps users translate genomic information into biological function. Proposals submitted by July 31, 2021 will be part of the next review.

    Read more

  • News & Publications
    • News
    • Blog
    • Podcasts
    • Webinars
    • Publications
    • Newsletter
    • Logos and Templates
    • Photos
    Aerial photo of the switchgrass diversity panel late in the 2020 season at the Kellogg Biological Station in Michigan. (Robert Goodwin)
    A Team Effort Toward Targeted Crop Improvements
    A multi-institutional team has produced a high-quality reference sequence of the complex switchgrass genome. Building off this work, researchers at three DOE Bioenergy Research Centers have expanded the network of common gardens and are exploring improvements to switchgrass.

    More

    Artistic interpretation of how microbial genome sequences from the GEM catalog can help fill in gaps of knowledge about the microbes that play key roles in the Earth's microbiomes. (Rendered by Zosia Rostomian​, Berkeley Lab)
    Uncovering Novel Genomes from Earth’s Microbiomes
    A public repository of 52,515 microbial draft genomes generated from environmental samples around the world, expanding the known diversity of bacteria and archaea by 44%, is now available .

    More

    Green millet (Setaria viridis) plant collected in the wild. (Courtesy of the Kellogg lab)
    Shattering Expectations: Novel Seed Dispersal Gene Found in Green Millet
    In Nature Biotechnology, a very high quality reference Setaria viridis genome was sequenced, and for the first time in wild populations, a gene related to seed dispersal was identified.

    More

Our Science
Home › Our Science › Science Programs › Fungal & Algal Program › Benchmarks

Benchmarks

Complex intron-exon structure of eukaryotic genes makes their prediction challenging. Quality of gene prediction in eukaryotic genomes can be improved by combining different gene prediction approaches (ab initio, based on homology, ESTs, synteny, or their combinations) and experimental data (transcriptomics, proteomics, etc). In the course of fungal genome annotations we compared different gene predictors and annotation pipelines to assess and refine our annotation strategies for future genomes. Results of two such tests are presented here:

1. Annotation of Heterobasidion annosum genome

Results: Several gene predictors and annotation pipelines were used in annotating the genome of fungus H. annosum v1.0 and accuracy of gene prediction was compared based on homology and EST support. Combination of tools used in the JGI annotation pipeline predicted larger sets of genes with best support.

EuGene
[1]
GeneMark
[2]
FgenesH
[3]
JGI Pipe
[4,5]
Number of predicted gene models 11,547 9,609 8,409 12,270
with partial EST support 5,544 3,829 4,567 5,248
with full length EST support 2,538 1,182 2,896 3,073
with homology support 6,758 6,043 5,750 7,214
with strong homology support (>80% aa identity, >80% coverage) 112 109 174 187
with homology and EST support 2,894 2,172 2,720 2,953
Average EST coverage per gene 77.7% 68.2% 80.8% 79.1%
Supported splice sites 41,581 40,808 45,498 47,671
Average homology coverage per gene 64% 60% 68% 69%

EuGene models were built and provided by a collaborator. All models were used in JGI pipeline. EST support was computed based on 40,807 ESTs and 10,126 EST cluster consensus sequences mapped by BLAT; protein homology was computed by blast against NCBI NR.
Reference

2. Comparison of MAKER and JGI Annotation pipeline

Results: Publicly available annotation pipeline MAKER[6] was compared with JGI annotation pipeline [4,5]. For Basidiomycete Dichomitus squalens , JGI pipeline predicted more genes with better support using several lines of evidence.

MAKER
[6]
JGI Annotation pipeline
[4,5]
Number of predicted gene models 9,940 12,290
with Swissprot hits 6,521 7,356
with non-repeat PFAM domains 5,365 6,010
with EST support 9,252 10,796
with >90% EST support 7,729 9,178
Number of unique PFAM domains 2,207 2,245
Average EST coverage per gene 93.0% 93.3%
Splice sites supported by ESTs 99,627 102,200

Inputs: Aassembly v1.0 of D. squalens, 359,410 proteins seeds from NCBI NR, 16,501 EST cluster consensus sequences mapped by BLAT to the assembly. Mapper used the following gene predictors: Exonerate, FgenesH (same parameters as in JGI pipeline) and Augustus. All genes were blasted against the same Swissprot set of 530,264 protein sequences (downloaded Jul5 2011), EST sequences, and PFAM database(Pfam_v21)

3. Comparative Analysis Methods and Tools

Motivation

Genome annotation and analysis requires development and validation of new algorithms and tools. Several directions of this development include methods to analyze eukaryotic genome organization (tandem and segmental duplication, gene-based synteny, including for multiple related genomes), gene structure (intron conservation or loss across genomes), gene gain/loss (detection of possible errors in automated clustering results for analysis of gene families, creating whole genome based phylogenetic trees based on clustering results, pfam domain analysis to detect expanded and lost families), genome evolution, gene expression, genome variation, metabolic pathways and regulatory elements. Test new gene predictors, including those using Rna-Seq data and synteny-based approaches on validated gene sets in terms of accuracy and speed, pipelines (eg, MAKER), repeat finding software, and non-coding RNA finding software. This project aims at (1) developing algorithms and prototypes for new genome analysis methods for publications; (2) testing new gene prediction and genome analysis tools for possible integration into production annotation process.

Comparative Gene Modeling

Comparative gene modeling aimed to improve the initial gene predictions for a set of closely related organisms and correct for missing or incorrectly predicted genes (incorrect splice sites, chimeras, gene fragments, etc).The idea of comparative modeling is that for closely related genomes, most orthologs have the same conserved gene structure. The algorithm maps all gene models predicted in all genomes to all individual genomes, and for each locus selects among the potentially many competing models, the one which is most closely resemble the homologous genes from other genomes. This procedure maybe iterated several times until no change in gene models will be observed

Results

For Basidiomycete Dichomitus squalens reannotation using comparative modeling is compared with initial JGI production annotation:

JGI Annotation pipeline Comparative modeling
Number of predicted gene models 12,290 12,802
with Swissprot hits 7,356 7,900
with non-repeat PFAM domains 6,010 6,353
with EST support 10,796 11,105
with >90% EST support 9,178 9,444
Number of unique PFAM domains 2,245 2,322
Average EST coverage per gene 93.3% 93.3%
Splice sites supported by ESTs 102,200 104,246

 

Reference:

  1. Schiex T, Moisan A, Rouzé P. (2001) Computational Biology, selected papers from JOBIM’ 2000, no 2066 in LNCS. Springer Verlag; EuGène, an eukaryotic gene finder that combines several type of evidence; pp. 118–133.
  2. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. (2008) Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 18(12):1979-90.
  3. Solovyev V, Kosarev P, Seledsov I, Vorobyev D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1:S10.1-12.
  4. Grigoriev IV, Martinez DA, Salamov AA (2006) Fungal genomic annotation. In Applied Mycology and Biotechnology (Eds. Aurora, DK, Berka, RM, Singh, GB), Elsevier Press, Vol 6 (Bioinformatics), 123-142.
  5. http://genome.jgi.doe.gov/programs/fungi/FungalGenomeAnnotationSOP.pdf
  6. Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Sánchez Alvarado A, Yandell M. (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18(1):188-96.

 

  • Plant Program
  • Fungal & Algal Program
    • MycoCosm Fungal Portal
    • PhycoCosm Algal Portal
    • Genomic Encyclopedia of Fungi
    • 1000 fungal genomes
    • Benchmarks
    • Fungal & Algal Publications
  • Metagenome Program
  • Microbial Program
  • DNA Synthesis Science Program
  • Metabolomics Program
MycoCosm, the fungal genomics resource.

MycoCosm, the fungal genomics resource.

PhycoCosm, the algal genomics resource

PhycoCosm, the algal genomics resource.

  • Careers
  • Contact Us
  • Events
  • User Meeting
  • MGM Workshops
  • Internal
  • Disclaimer
  • Credits
  • Emergency Info
  • Accessibility / Section 508 Statement
  • RSS feed
  • Flickr
  • LinkedIn
  • Twitter
  • YouTube
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2021 The Regents of the University of California