Cas14 proteins discovered from JGI’s IMG/M database and biochemically characterized at UC Berkeley and the Innovative Genomics Institute. 
a white board with several DNA strands drawn on it
Click on the image above or click here to watch a CRISPR Whiteboard Lesson from the Innovative Genomics Institute, this one focuses on the PAM sequence.

The Science

Researchers report the discovery of miniature Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) associated proteins that can target single-stranded DNA (ssDNA). The discovery was made possible by mining the datasets in the Integrated Microbial Genomes and Microbiomes (IMG/M) suite of tools managed by the U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a DOE Office of Science User Facility. The sequences were then biochemically characterized by a team led by Jennifer Doudna’s group at the University of California (UC), Berkeley, which is also affiliated with the Innovative Genomics Institute.

The Impact

The ability to accurately edit genomes (that is, repair gene mutations, and either delete or add genes in a precise fashion) has applications across many areas. In particular, gene editing is being used to develop drought, flooding, pest-resistant, and better-yielding crops. On the clinical side, gene editing is being advanced as a potential therapy for both genetic and complex disease. Finally, gene editing is being used to understand how a person’s genetic makeup predisposes them to, or protects them from, disease. Much of the work on genome editing has focused on the seminal CRISPR-Cas9 system, which targets double-stranded DNA. The discovery of Cas proteins that can target single-stranded DNA molecules broadens the range of applications for CRISPR-Cas systems. It also underscores the untapped potential waiting to be unearthed in sequencing and analyzing uncultivated microbes.

Summary

The CRISPR-Cas system is an immune mechanism in bacteria that confers resistance to foreign genetic elements by incorporating short sequences from infecting viruses and phages. In the event of a new infection, the microbes use the genetic information encoded in CRISPR sequences to target the virus and release attack enzymes in the form of Cas enzymes to cut the DNA and disable the virus. In Science, a team led by researchers from the University of California, Berkeley, report the identification of active Cas enzymes – dubbed Cas14 – that target ssDNA. In contrast, the seminal Cas9 proteins cleave double-stranded DNA.

Co-first author Lucas Harrington was a graduate student from study senior author Jennifer Doudna’s lab, and he worked with co-first author David Burstein, then a postdoctoral fellow with Doudna and longtime JGI collaborator Jill Banfield, also at UC Berkeley. Harrington is now at Mammoth Biosciences while Burstein is now at Tel Aviv University.

The Cas14 proteins are ~400–700 amino acids (aa) in size, half that of previously known class 2 CRISPR enzymes that are typically 950—1400 aa. They were initially identified by searching for Cas12d homologs across IMG/M’s assembled metagenomic data. It turned out that some of these were shorter than the typical Cas12d proteins and were also found next to cas1 genes. Further analysis led to the identification of a new family of Cas proteins, named Cas14. Using these sequences as a starting point, JGI data scientist David Paez-Espino in Nikos Kyrpides’ Microbiome Data Science group mined the IMG/M system with its large collection of publicly accessible metagenomic datasets from a wide variety of ecosystems around the world, conducting iterative searches using statistical analyses to continuously refine and improve the process.

The results yielded several CRISPR-Cas systems, and based on several experiments conducted by Doudna’s lab at UC Berkeley, close to 40 CRISPR-Cas14 systems belonging to eight subtypes were identified. Additionally, using Cas14a, the team was able to develop a Cas14-DETECTR that allows for CRISPR-based detection of ssDNA pathogens.

With few exceptions, the Cas14 proteins identified were found within the archaeal superphylum DPANN, named by a JGI-led team for the first five groups discovered: Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaea. This work represents an excellent showcase of the unique capabilities provided from the IMG/M database in enabling new discoveries and is a continuation of previous collaboration of the JGI with the Doudna lab on the discovery of thermostable Cas9 genes.

The work also used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory.

Datasets
Cas14 proteins on IMG/M portal
Click to visit IMG/M portal

Contacts

PI Contacts

Jennifer Doudna
University of California, Berkeley
[email protected]

Back to Science Stories
More Details