Genome Res 25(4) , 534-43 (Apr 1 2015)
Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short (150-bp) and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long-read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50 = approximately 2200 bp). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long-read data also revealed thousands of species whose abundances were <0.1% in all samples. Most of the organisms in this "long tail" of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall, the study showed that a diversity of closely related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities and can be effectively studied using a combination of long and short reads.