Virophage discovery pipeline. (A) MCP amino acid sequences from reference isolated genomes and published metagenomic contigs were queried against the IMG/VR database with stringent e value cutoffs. All homologous sequences detected were then clustered together to build four independent MCP profiles. (B) The resulting four MCP models were used to recruit additional homologous sequences from the entire IMG/M system. All new sequences were clustered, and models were built creating a final set of 15 unique MCP HMMs. (C) These 15 unique MCP HMMs were then used to search two different databases for homologous sequences: the IMG/M system and a custom assembled human gut database containing 3771 samples from NCBI’s Sequence Read Archive (SRA). (D) The resulting set of 28,294 non-redundant (NR) sequences with stringent e value cutoffs was filtered by size and e by the presence of the four core virophage genes (high-quality genomes; HQ virophages). Finally, completeness of novel metagenomic virophage genomes wsa predicted based on circularity or presence of inverted terminal repeats (ITR). (Figure from Paez-Espino et al. Microbiome (2019) 7:157 https://doi.org/10.1186/s40168-019-0768-5)