Stand Genomic Sci 9 , 2 (2014)
Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the Genomic Encyclopedia of Bacteria and Archaea project, we here describe the features of E. coli DSM 30083T together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083T in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.