Phylogenetic Diversity

Phylogenetic Tree

Image: Phylogenetic Tree, Chris Rinke

Genome sequencing has revolutionized our understanding of microorganisms and the role they play in important processes, including pathogenesis, energy production, bioremediation, global nutrient cycles; and the origins, evolution, and diversity of life. Currently, there are approximately 15,000 complete or draft genome sequences of Bacteria and Arcahea available. These genome sequences show a highly biased phylogenetic distribution when compared to the extent of microbial diversity known today. This bias has resulted in a major gap in our knowledge of microbial genome complexity and our understanding of the evolution, physiology, and metabolic capacity of microbes.

Under the umbrella of the Genomic Encyclopedia of Bacteria and Archaea (GEBA), the DOE JGI is sequencing thousands of bacterial and archaeal genomes from diverse branches of the Tree of Life. This phylogenomic approach is of great value in multiple areas of public and general scientific interest. The potential benefits include: (a) improved identification of protein families and ortholog groups across species, which will improve annotation of other microbial genomes; (b) improved phylogenetic anchoring of metagenomic data; (c) gene discovery (which tends to be maximized by selecting phylogenetically novel organisms); (d) a better understanding of the processes underlying the evolutionary diversification of microbes (e.g., lateral gene transfer and gene duplication); (e) a better understanding of the classification and evolutionary history of microbial species; and, (f) improved correlations of phenotype and genotype in microbes.

GEBA-family Projects

The original GEBA project was completed in collaboration with the DSMZ.  In a pilot study, the  DOE JGI sequenced 53 bacterial and 3 archaeal novel and highly diverse genomes, representing a first step towards a phylogenetically balanced sequence space in the microbial tree of life (Wu et al, 2009, Nature). An additional approximately 200 GEBA genomes have since been sequenced; the bulk of which are of finished quality. A phylum-level GEBA study, the CyanoGEBA effort, sequenced the genomes of approximately 50 phylogenetically diverse cyanobacterial strains from all five morphological sections, including several from the never-before-sequenced Pleurocapsales and Stigonematales (Shih et al, 2013, PNAS). In an effort to sequence type strains and provides links between sequence data and phenotypic information, the GEBA type strain project is currently sequencing types strains of high phylogenetic diversity and isolated from soil habitats as well as associated with plant hosts. To study Root Nodulating Bacteria (RNB) and their symbioses with legumes after infection of root hairs, 100 RNB strains isolated from various locations around the world have been sequenced under the GEBA-RNB initiative. Lastly, the GEBA-MDM project makes use of the high throughput single cell genomics capabilities at the DOE JGI to explored the genomes of candidate phyla representatives. In a pilot study, approximately 200 single-cell genomes from 29 mostly underexplored branches of the tree of life were sequenced shedding light on the metabolic potential and phylogeny of this microbial dark matter (Rinke et al, 2013, Nature).

GEBA type strain

Type strains map from

Image: Type strains map from

The type strain commonly refers to the nomenclatural type or the element of a taxon with which the name is permanently associated. In practice, this is usually a living culture that was chosen to represent a prokaryotic species when the species name was proposed. Because of their importance, type strains for species are usually carefully maintained in a number of culture collections throughout the world. Because of the rules of nomenclature, type strains of species should not be identical or highly similar with the type strain of any other species.

The goal of the GEBA-type strain project is generating a comprehensive genomic encyclopedia of the validly named bacterial and archaeal species in order to (i) catalog bacterial and archaeal diversity, (ii) unravel novel functions derived from novel protein families, and (iii) improve the binning and annotation of metagenomes. Type strains play a crucial role in defining the phylogenomic and taxonomic space of Bacteria and Archaea. They constitute the living cultures that serve as a fixed reference point for the assignment of bacterial and archaeal names and exhibit all the relevant phenotypic and genotypic properties cited in the original published taxonomic circumscriptions.

During the first phase of GEBA-type strain study we have identified and sequenced 1,000 new phylogenetic diverse type strains. Our ongoing activities include the scrutiny of our data set to search novel functions, protein families, and undiscovered biosynthetic gene clusters –a key aspect for detection of novel natural products. Finally, we will be able to study the effect of our findings on metagenomic analyses.

PIs: Nikos Kyrpidis, David Paez Espino, JGI; Hans-Peter-Klenk, DSMZ Germany; Barny Whitman, University of Georgia.


Image: Nodules induced by burkholderia on Lebeckia ambigua, John Howieson and Wayne Reeve.

Image: Nodules induced by burkholderia on Lebeckia ambigua, John Howieson and Wayne Reeve.

Root Nodulating Bacteria (RNB) are soil inhabiting bacteria that form symbioses with legumes after infection of root hairs. The GEBA-RNB initiative at the JGI is sequencing 100 RNB strains isolated from various locations around the world. This project will support the systematic sequence-based studies and understanding of the biogeographical effects on species evolution as well as the mechanisms of symbiotic nitrogen fixation (SNF) by RNB. The latter is a significant asset for world agricultural productivity, farming economy and environmental sustainability.

Image: Nitrogen fixation by burkholderia on lebeckia ambigua (centre pot), John Howieson and Wayne Reeve.

Image: Nitrogen fixation by burkholderia on lebeckia ambigua (centre pot), John Howieson and Wayne Reeve.

SNF reduces energy consumption required to produce nitrogenous fertilizer, saving $US 6.8 billion per year. SNF significantly reduces greenhouse gas emissions compared to intensive agriculture practice using artificial N-input. SNF benefits the environment (reduces dry-land salinity, increases soil fertility, and prevents waterway eutrophication).

Root endosymbioses promote carbon sequestration. Shared genetic mechanisms between fungal and bacterial root endosymbioses exist and a detailed understanding of endosymbionts will be beneficial to drive bioenergy development from trees. The GEBA-RNB project is based on collaboration between JGI and an international consortium from 15 countries coordinated by Wayne Reeve from Murdoch University, Australia.

PIs: Nikos Kyrpidis, JGI; Wayne Reeve, University of Perth, Australia.


Image: Great Boiling Springs (Nevada), Robert Dodsworth

Image: Great Boiling Springs in Nevada, Robert Dodsworth

While the bulk of the microbial genomes sequenced to date are derived from cultivated bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes of these largely mysterious species referred to as the microbial dark matter.

Image: Main Great Boiling Spring in Nevada, Robert Dodsworth

Image: Main Great Boiling Spring in Nevada, Jeremy Dodsworth

The GEBA-Microbial Dark Matter (GEBA-MDM) project represents a natural extension of the original GEBA project and aims to use single-cell genomics to explore the uncultivated majority of Bacteria and Archaea. The first phase of GEBA-MDM targeted 201 single cell representatives of 29 mostly underexplored branches of the tree of life, with a special focus on candidate phyla (phyla proposed on the basis of environmental sequences that have no cultivated representatives). We uncovered unexpected metabolic features including a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. Single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats improving the organism-level interpretation of ecosystem function. Overall this study greatly expanded the genomic representation of the tree of life and provided a systematic step towards a better understanding of microbial evolution. The phase II GEBA-MDM project is currently underway, which is seeking out new environments with high levels of candidate phyla representatives.

PIs: Tanja Woyke, Christian Rinke, Natalia N. Ivanova, Nikos C. Kyrpides, JGI; Phil Hugenholtz, Center of Ecogenomics, Australia; Alexander Sczyrba, Bielefeld University, Germany; Aaron Darling, University of Technology Sydney; Brian P. Hedlund, University of Nevada; Jonathan Eisen, University of California, Davis; George Tsiamis, University of Patras, Greece; Stefan M. Sievert, Woods Hole Oceanographic Institution; Wen-Tso Liu, University of Illinois at Urbana-Champaign; Steven J. Hallam, University of British Columbia; Ramunas Stepanauskas, Bigelow Laboratory for Ocean Sciences.


Wu D et al. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature. 2009 Dec 24;462(7276):1056-60. doi: 10.1038/nature08656.

Shih et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. PNAS. 2013 Jan 15;110(3):1053-8. doi:10.1073/pnas.1217107110.

Rinke C et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013 Jul 25;499(7459):431-7. doi: 10.1038/nature12352.