palmdb
is a database of viral polymerase palmprint (barcode) sequences classified by clustering sequences into species-like operational taxonomic units (OTUs) at 90% amino acid identity.palm_annot
(link) is a command-line tool to identify RdRp sequences and annotate the palmprints within them.palmID
(link) is a web-tool to querypalmdb
with an RdRp sequence, and retrieve matches in the SRA.
The hallmark gene for RNA viruses is the RNA-dependent RNA-polymerase (RdRp
). Palmprints are a ~100aa sub-sequence of RdRp
delineated by the conserved catalytic motifs "A", "B", and "C" in the palm sub-domain.
- Release:
palmdb.current.tar.gz
(4 GB)
- sOTU palmprints:
sotus.palmprint.faa
(63 MB) - all-unique palmprints, unclustered:
unique.palmprint.faa
(131.3 MB) - sOTU source labels:
label_sotu.tsv
(178.2 MB)
Source-original sequences are avaialble in the raw/
folder of the Current Release
-
Babaian and Edgar, 2022. Ribovirus classification by a polymerase barcode sequence. PeerJ
-
Edgar et al., 2022. Petabase-scale sequence alignment catalyses viral discovery. Nature