Frontiers in Microbiology 10 (Mar 20 2019)
Amplicon sequencing of 16S, ITS, and 18S regions of microbial genomes is a commonly used first step toward understanding microbial communities of interest for human health, agriculture, and the environment. Correlation network analysis is an emerging tool for investigating the interactions within these microbial communities. However, when data from different habitats (e.g., sampling sites, host genotype, etc.) are combined into one analysis, habitat filtering (co-occurrence of microbes due to habitat sampled rather than biological interactions) can induce apparent correlations, resulting in a network dominated by habitat effects and masking correlations of biological interest. We developed an algorithm to correct for habitat filtering effects in microbial correlation network analysis in order to reveal the true underlying microbial correlations. This algorithm was tested on simulated data that was constructed to exhibit habitat filtering. Our algorithm significantly improved correlation detection accuracy for these data compared to Spearman and Pearson correlations. We then used our algorithm to analyze a two real data sets of 16S variable region amplicon sequences that were expected to exhibit habitat filtering. Our algorithm was found to effectively reduce habitat effects, enabling the construction of consensus correlation networks from data sets combining multiple related sample habitats.