CheckV is the first fully automated, command-line tool for assessing the quality of metagenome-assembled viral genomes.
As input, the program takes a FASTA file containing the genomic sequence of a new virus and performs three main tasks: (1) CheckV estimates the genome completeness (0-100%) based on comparison to a large database of environmentally diverse complete viral genomes, (2) CheckV identifies closed genomes based on the presence of terminal repeats and provirus integration sites, and (3) CheckV identifies host-virus boundaries for assembled proviruses and removes any detected non-viral regions. Based on these results, the program classifies each input sequence into one of five quality tiers: complete, high-quality (>90% completeness), medium-quality (50-90% completeness), low-quality (0-50% completeness), or undetermined-quality.
CheckV estimates the genome completeness (0-100%) based on comparison to a large database of environmentally diverse complete viral genomes, (2) CheckV identifies closed genomes based on the presence of terminal repeats and provirus integration sites, and (3) CheckV identifies host-virus boundaries for assembled proviruses and removes any detected non-viral regions. Based on these results, the program classifies each input sequence into one of five quality tiers: complete, high-quality (>90% completeness), medium-quality (50-90% completeness), low-quality (0-50% completeness), or undetermined-quality.
The CheckV pipeline and database are freely available at: https://bitbucket.org/berkeleylab/checkv.