Comparative sequence analysis, be it between genomes or metagenomes can significantly expand our biological knowledge by letting us ask questions directly of ecosystems and their inhabitants. The team proposes calculating signatures for the approximately 5,200 private microbial genomes and all of private and public metagenomes in the IMG/M database. Having the MinHash signatures will allow researchers to quickly estimate how similar two sets are. Incorporating the Integrated Microbial Genomes and Microbiomes (IMG/M) database’s private genome collection into a searchable Sequence Bloom Tree (SBT) index would facilitate taxonomic classification of DNA sequence set, enhance the utility of the archive and provide a great resource for rapid comparison of DNA sequence data.
Proposer: Phillip Brooks, University of California, Davis
Proposal: Advancing Metagenome Classification and Comparison by MinHash Fingerprinting of IMG/M Data Sets