MetaHipMer is a metagenome assembler designed to run on supercomputers and large clusters of compute nodes, performing coassembly of the largest datasets.
Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer. These datasets are often assembled together sample by sample, called multiassembly. Multiassembly captures strain variation across samples, however combining the results of multiassemblies is laborious and multiassembly may fail to detect low-abundance microbes. Coassembly recovers more of the genomes in an environment than multiassembly, at a higher degree of completeness, along with lower abundance genomes that multiassembly cannot detect. Having a fast, scalable metagenome assembler enables a user to more easily perform both coassembly and multiassembly, assembling both abundant high strain variation genomes and low-abundance rare genomes. MetaHipMer is being applied to terabyte-scale datasets that could never before be coassembled, and has the potential to discover rare and possibly novel lineages of microbial life.
MetaHipMer is available for public use under an open source license, and can be downloaded from https://bitbucket.org/berkeleylab/mhm2/src/master/,