The dramatic shift in sequencing technologies that allows genome researchers to generate the equivalent of a single human genome in days rather than the decades it took multiple organizations to complete a single one has also shifted the bottleneck from sequence production to sequence assembly. For example, the Sanger platform routinely produced reads 700 basepairs long while the Illumina platform generates reads 35-100 basepairs in length, making the assembly process challenging DOE JGI researchers led by Eukaryote Program head Dan Rokhsar and Jarrod Chapman, have developed an efficient way to do short-read assemblies of eukaryotic genomes using a computer algorithm referred to as meraculous.
In a paper published August 18, 2011 in PLoS ONE, Chapman and his colleagues used meraculous to assemble 75-bp Illumina reads of the yeast Pichiastipitis, a microbial fermenter of the five-carbon sugar xylose for ethanol production that was sequenced by the DOE JGI in 2007.
“The meraculous assembly reconstructs 95% of the Pichia genome in long contigs and scaffolds without any errors,” the researchers wrote in their paper, which includes a link from which the software can be downloaded. “Many stages of the meraculous algorithm are parallelized, and to document their scalability we describe an assembly of simulated data for the ~120 MbpArabidopsis thaliana genome, and show that for mammalian genomes the limiting memory structure requires less than 10 Gb of RAM.”
See Jarrod Chapman discuss meraculous in a video from the JGI/Argonne HPC Workshop.