Plos Computational Biology 17(3) , 21 (Mar 2021)
Understanding CRISPR-Cas systems-the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks-is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical ‘rich-get-richer’ mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems. Author summary About half of bacterial species and most of archaea are equipped with CRISPR-Cas systems of adaptive immunity to protect them from their natural enemies-bacteriophages. The memory of CRISPR-Cas contains a catalogue of the fingerprints of previously experienced offenders which is passed down to the bacterial progeny. The microbial resistance to viruses largely depends on the number of records in this CRISPR array. Our analysis combining metagenomics data and mathematical modelling shows that the size of CRISPR arrays in microbial populations generally follows a power law distribution. Power law distributions have been found in many other complex systems (earthquakes, financial markets, animal movement). We argue that our model explains the presence of a power law in CRISPR arrays and the rareness of all-resistant super microbes.