title | layout | excerpt | sitemap | permalink |
---|---|---|---|---|
HLi Lab - Software |
textlay |
Software and Resources |
false |
/software |
Our lab developed several alignment and assembly algorithms critical to high-throughput sequence analysis. These include samtools, BWA, minimap2 and hifiasm, with each cited for 1000+ times per year. We also explore a variety of algorithms related to variant calling (e.g. longcallR and longcallD), pangenome analysis (e.g. minigraph and pangene), protein alignment (e.g. miniprot), full-text indexing (e.g. ropebwt3), immunology (e.g. Immuannot and T1K), evolution (e.g. psmc and compleasm) and high-performance data structures in general (e.g. bedtk and BGT). Most of our tools work years after their initial publications and are often well received.
- longcallD: small and large variant calling from long genomic reads, unpublished
- longcallR: SNP calling and haplotype-specific analysis for long RNA-seq reads, unpublished
- ropebwt3: construction and utility of BWT for DNA string sets, published in Li (2014) and Li (2024).
- Immuannot: annotating HLA and KIR genes in phased assemblies, published in Zhou et al (2024).
- pangene: constructing pangenome gene graphs, published in Li (2024).
- compleasm: a reimplementation of BUSCO for evaluating the gene completeness of an assembly, published in Huang and Li (2023).
- srf: assembling satellite DNA, published in Zhang et al (2023).
- miniprot: protein-to-genome alignment allowing splicing and frameshift, published in Li (2023).
- bedtk and cgranges: a fast toolkit and library for working with BED files, published in Li and Rong (2020).
- yak: k-mer counting and assembly evaluation, developed for Cheng et al (2021).
- gwfa: graph wavefront alignment with edit distance, preprinted at Zhang et al (2022). Merged into gfatools and used by minigraph.
- minigraph: pangenome construction and sequence-to-graph alignment, published in Li et al (2020).
- dipcall: variant calling for phased diploid assemblies, developed for Li et al (2019).
- minimap2: widely used long-read aligner, published in Li (2018) and improved in Li (2021).
- miniasm: a simple long-read assembler, published in Li (2016). Useful for assembly at small scale; not recommended for production.
- BWA: widely used short-read aligner, published in Li and Durbin (2009), Li and Durbin (2010) and Li (2013).
- minipileup: simple pileup-based variant caller, unpublished
- seqtk: a small toolkit for manipulating sequences in FASTA/FASTQ, unpublished
- gfatools: a toolkit for working with graphs in the GFA format, unpublished
- miniwfa: a reimplementation of the wavefront alignment algorithm at low memory. Unpublished but used in minigraph.
- jstreeview: interactive phylogenetic tree viewer/editor in JavaScript, unpublished
- ntsm: detecting sample swaps, published in Chu and Li (2024).
- hifiasm: genome assembly with PacBio HiFi, Nanopore and Hi-C data, published in Cheng et al (2021), Cheng et al (2022) and Cheng et al (2024). Maintained by Haoyu Cheng.
- hifiasm-meta: metagenome assembly with PacBio HiFi, published in Feng et al (2022) and Feng et al (2024). Maintained by Xiaowen Feng.
- T1K: HLA and KIR genotyping with short reads, published in Song et al (2023). Maintained by Li Song.
- chromap: aligning short ChIP-seq, ATAC-seq or Hi-C reads, published in Zhang et al (2021). Maintained by Haowen Zhang and Li Song.
- hifieval: evaluating error correction accuracy for HiFi data, published in Guo et al (2023).
- tabix: indexing and querying coordinate-sorted formats such as VCF and BED, published in Li (2011). Now part of the samtools project.
- samtools: utilities for manipulating alignments in the SAM format. Initially published in Li et al (2009), Li (2011a) and Li (2011b). Maintained by Sanger since 2013.
- TreeBeST: the core engine behind TreeFam for tree building. Some components are described in PI's thesis. Maintained by Ensembl Compara.
- dna-nn: model and predict short DNA sequence features with neural networks, published in Li (2019).
- hickit: 3D modeling for single-cell Hi-C, developed for Tan et al (2018). It was not used in this paper but used in Longzhi Tan's later work.
- BGT: fast and lightweight genotype query across many samples, published in Li (2016).
- fermi, fermi2 and FermiKit: short-read assembler, published in Li (2012) and Li (2015).
- fermi-lite: a library in C for short-read assembly in small regions, adapted from FermiKit
- BFC: correcting sequencing errors in short reads, published in Li (2015).
- bioawk: BWK awk modified for biological data, unpublished
- psmc: infer historical population sizes from a diploid genome, published in Li and Durbin (2011).
- MAQ: short-read aligner, published in Li et al (2008). It is still working but there is no point to use it now.
- Ropebwt3 indices for human and for bacteria, initially published in Li (2024).
- Pangene graphs, initially published in Li et al (2024).
- Minigraph graphs, initially published in Li et al (2020).
- Immuannot annotations, initially published in Zhou et al (2024).
- Manually curated locations of centromeric repeats for variant filtering.
- Human reference genome analysis sets including BWA and Bowtie2 indices.
- Portable binaries for samtools v1.14 and for GCC v10.3.0 on CentOS 7.
- Haplotype-resolved PGP1 assembly
- Easy genomic regions for short-read variant calling
- TreeFam: database of animal gene trees. Published in Li et al (2006) and Ruan et al (2008), and described in PI's thesis. No longer maintained since 2013.