The genome assembly challenges posed by short sequence reads from sequencing platforms such as 454/Roche and Illumina are well-documented; the lack of reference genome data can hinder attempts to put together the myriad of short DNA sequences. Researchers from the Georgia Institute of Technology and the DOE JGI wanted to determine the impact of short reads from next-generation sequencing (NGS) platforms on assembling individual genomes from complex microbial communities.
|Sunset on Georgia’s Lake Lanier, which was a source of data samples used in this study.|
As described in an article published in the April issue of The ISME Journal, the team used Illumina data consisting of 100-basepair paired-end sequences from soil and freshwater metagenomes, as well as simulated datasets for their studies. For example, they “spiked” a dataset from a freshwater planktonic community sampled from a lake in Atlanta, Georgia with a reference bacterial genome and then compared the results of deriving that reference genome against assembling it from the genome reads alone.
The team reported that they were able to accurately assemble a single genome from the complex community when the NGS platform used had at least 20X coverage. At less coverage, they added, they found more errors and problems with individual genome assembly.
“The results presented here reveal the errors and limitations as well as the strengths of metagenomics for population analysis, and provided practical standards and guidelines for experimental design and analysis,” the team concluded. “Some of our results should be independent of the NGS platform used and therefore broadly applicable to short-read sequencing.”