Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm


Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.

Fig. 1: Outline of the hifiasm algorithm.
Fig. 2: Effect of false read binning.

Data availability

All HiFi data were obtained from the NCBI Sequence Read Archive: SRR11606869 for Z. mays, SRR11606870 for M. musculus, SRR11606867 for F. × ananassa, SRR11606868 and SRR12048570 for R. muscosa, SRP251156 for S. sempervirens, SRR11292120–SRR11292123 for CHM13, ERX3831682 for HG00733, and four runs (SRR10382244, SRR10382245, SRR10382248 and SRR10382249) for HG002. For trio binning and computing QV, short reads were also downloaded: SRR7782677 for HG00733, ERR3241754 for HG00731 (father), ERR3241755 for HG00732 (mother) and SRX1082031 for CHM13. GIAB’s ‘homogeneity Run01’ short-read runs were used for the HG002 trio. These HG002 reads were downsampled to 30-fold coverage. The BAC libraries of CHM13 and HG00733 can be found at https://www.ncbi.nlm.nih.gov/nuccore/?term=VMRC59+and+complete/and https://www.ncbi.nlm.nih.gov/nuccore/?term=VMRC62+and+complete/, respectively. The HG002 major histocompatibility complex reference sequences can be found at https://github.com/NCBI-Hackathons/TheHumanPangenome/tree/master/MHC/assembly/MHCv1.1/ (ref. 26). For BUSCO, the Embryophyta, Tetrapoda and Mammalia datasets are available at https://busco-data.ezlab.org/v4/data/lineages/embryophyta_odb10.2020-09-10.tar.gz, https://busco.ezlab.org/v2/datasets/tetrapoda_odb9.tar.gz and https://busco.ezlab.org/v2/datasets/mammalia_odb9.tar.gz, respectively. The CHM13 reference (v0.9) generated by the T2T consortium can be found at https://s3.amazonaws.com/nanopore-human-wgs/chm13/assemblies/chm13.draft_v0.9.fasta.gz. The hifiasm assemblies produced in this work are available at https://zenodo.org/record/4393631 and https://zenodo.org/record/4393750.

Code availability

Hifiasm code is available at https://github.com/chhylp123/hifiasm/.


This study was supported by grants from the US National Institutes of Health (R01HG010040, U01HG010961 and U41HG010972 to H.L.).

H.C. and H.L. designed the algorithm, implemented hifiasm and drafted the manuscript. H.C. benchmarked hifiasm and other assemblers. G.T.C. ran hifiasm for S. sempervirens, HiCanu for R. muscosa, Peregrine for S. sempervirens and R. muscosa, and Falcon-Unzip for all datasets. X.F. helped with evaluation of the manuscript. H.Z. provided valuable suggestions for error correction and ran BUSCO.

Correspondence to Heng Li.

Competing interests

G.T.C. is an employee of PacBio. H.L. is a consultant of Integrated DNA Technologies and on the Scientific Advisory Boards of Sentieon, BGI and OrigiMed.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peer review information Nature Methods thanks Benedict Paten and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lin Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Supplementary Information

Supplementary software commands, Supplementary Tables 1–10 and Supplementary Fig. 1.

Reporting Summary

Cheng, H., Concepcion, G.T., Feng, X. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021). https://doi.org/10.1038/s41592-020-01056-5

