To understand virus-host dynamics, computation helps fill in what cultivation can’t.
The Science
At first, viruses are merely toolkits. To do anything, they must find a host, and not just any host will do. It must be a specific host the virus has adapted to commandeer. For bacteriophage viruses, these hosts are microbes like bacteria, not humans. With metagenomic sequencing, researchers have found more of these viruses than ever before, in all kinds of ecosystems. However, matching these viral genetic sequences to their hosts is crucial to understanding what these viruses can do. Building on existing virus-host prediction approaches, researchers have created a new program called iPHoP (pronounced “eye-pop”, freely available online). It combines and evaluates multiple predictions to reliably match viruses with their archaea and bacteria hosts.
The Impact
Within the domains of archaea and bacteria, millions of microbes govern ecosystems. These organisms carry out important environmental processes like carbon fixation, nitrogen cycling and methane production. Meanwhile, environmental viruses are constantly infecting and reprogramming these microbes. A better understanding of virus-host interactions opens the possibility of using phages to engineer microbial communities. One day, these phages could boost plant-microbe interactions, nutrient cycling, or carbon sequestration.
Summary
Existing programs use a variety of approaches to match a virus with a potential host. This can result in different predictions. The iPHoP program brings several of these approaches together into the same workflow, then uses a machine-learning model to give its integrated prediction a confidence score. The result is a genus-level host prediction tool that draws on the strengths of multiple virus-host prediction methods.
In building this program, described in Plos Biology, the research team led by scientists at the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, aimed to streamline a process they often took on: generating virus-host predictions via multiple routes, then considering different suggested matches. The machine learning model in the iPHoP program creates a way of evaluating these different routes. To train the model, the team used 1,870 known virus-host pairs to generate a matrix of data points.
Applied to 216,015 high-quality virus sequences in the IMG/VR database, the iPHoP program generated many new high-confidence host predictions across a variety of ecosystems. In particular, it predicted many more likely hosts for viruses in human microbiome samples, reflecting factors such as the quality and number of datasets for human microbiome data. In other environments, such as terrestrial soil, having additional isolates and metagenomic datasets will strengthen future virus-host predictions. This information could open the door to improving crop yields and a better understanding of the roles viruses and their host microbes play in nutrient cycles.