To understand virus-host dynamics, computation helps fill in what cultivation can’t.
The Science
At first, viruses are merely toolkits. To do anything, they must find a host, and not just any host will do. It must be a specific host the virus has adapted to commandeer. For bacteriophage viruses, these hosts are microbes like bacteria, not humans. With metagenomic sequencing, researchers have found more of these viruses than ever before, in all kinds of ecosystems. However, matching these viral genetic sequences to their hosts is crucial to understanding what these viruses can do. Building on existing virus-host prediction approaches, researchers have created a new program called iPHoP (pronounced “eye-pop”, freely available online). It combines and evaluates multiple predictions to reliably match viruses with their archaea and bacteria hosts.
The Impact
Within the domains of archaea and bacteria, millions of microbes govern ecosystems. These organisms carry out important environmental processes like carbon fixation, nitrogen cycling and methane production. Meanwhile, environmental viruses are constantly infecting and reprogramming these microbes. A better understanding of virus-host interactions opens the possibility of using phages to engineer microbial communities. One day, these phages could boost plant-microbe interactions, nutrient cycling, or carbon sequestration.
Summary
Existing programs use a variety of approaches to match a virus with a potential host. This can result in different predictions. The iPHoP program brings several of these approaches together into the same workflow, then uses a machine-learning model to give its integrated prediction a confidence score. The result is a genus-level host prediction tool that draws on the strengths of multiple virus-host prediction methods.
In building this program, described in Plos Biology, the research team led by scientists at the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, aimed to streamline a process they often took on: generating virus-host predictions via multiple routes, then considering different suggested matches. The machine learning model in the iPHoP program creates a way of evaluating these different routes. To train the model, the team used 1,870 known virus-host pairs to generate a matrix of data points.
Applied to 216,015 high-quality virus sequences in the IMG/VR database, the iPHoP program generated many new high-confidence host predictions across a variety of ecosystems. In particular, it predicted many more likely hosts for viruses in human microbiome samples, reflecting factors such as the quality and number of datasets for human microbiome data. In other environments, such as terrestrial soil, having additional isolates and metagenomic datasets will strengthen future virus-host predictions. This information could open the door to improving crop yields and a better understanding of the roles viruses and their host microbes play in nutrient cycles.
Contacts
BER Contact
Ramana Madupu, Ph.D
Program Manager
Biological Systems Sciences Division
Office of Biological and Environmental Research
Office of Science
Department of Energy
[email protected]
PI Contact
Simon Roux
Viral Genomics Group Lead
DOE Joint Genome Institute
[email protected]
Funding
This work was supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research, Early Career Research Program awarded under UC-DOE Prime Contract DE-AC02-05CH11231. The work conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231 (SR, APC, SN).
Publications
Roux, S. et al. “iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria.” PloS Biology 21(4): e3002083 (2023). DOI: 10.1371/journal.pbio.3002083
Related Links
- IMG/VR database: genomes of cultivated and uncultivated viruses
- JGI Science Highlight: A History of Phage-Host Interactions With Help From CRISPRs
- Intern Spotlight: Exploring Phages at JGI