Published in:
Nucleic Acids Research 51(W1) , w108-w114 ( 2023)
Author(s):
DOI:
10.1093/nar/gkad385
Abstract:
Carbohydrate-processing enzymes, CAZymes, are classified into families based on sequence and three-dimensional fold. Because many CAZyme families contain members of diverse molecular function (different EC-numbers), sophisticated tools are required to further delineate these enzymes. Such delineation is provided by the peptide-based clustering method CUPP, Conserved Unique Peptide Patterns. CUPP operates synergistically with the CAZy family/subfamily categorizations to allow systematic exploration of CAZymes by defining small protein groups with shared sequence motifs. The updated CUPP library contains 21,930 of such motif groups including 3,842,628 proteins. The new implementation of the CUPP-webserver, https://cupp.info/, now includes all published fungal and algal genomes from the Joint Genome Institute (JGI), genome resources MycoCosm and PhycoCosm, dynamically subdivided into motif groups of CAZymes. This allows users to browse the JGI portals for specific predicted functions or specific protein families from genome sequences. Thus, a genome can be searched for proteins having specific characteristics. All JGI proteins have a hyperlink to a summary page which links to the predicted gene splicing including which regions have RNA support. The new CUPP implementation also includes an update of the annotation algorithm that uses only a fourth of the RAM while enabling multi-threading, providing an annotation speed below 1 ms/protein.