- Main
Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data
- Guccione, Caitlin;
- Patel, Lucas;
- Tomofuji, Yoshihiko;
- McDonald, Daniel;
- Gonzalez, Antonio;
- Sepich-Poore, Gregory D;
- Sonehara, Kyuto;
- Zakeri, Mohsen;
- Chen, Yang;
- Dilmore, Amanda Hazel;
- Damle, Neil;
- Baranzini, Sergio E;
- Hightower, George;
- Nakatsuji, Teruaki;
- Gallo, Richard L;
- Langmead, Ben;
- Okada, Yukinori;
- Curtius, Kit;
- Knight, Rob
- et al.
Published Web Location
https://doi.org/10.1038/s41467-025-56077-5Abstract
As next-generation sequencing technologies produce deeper genome coverages at lower costs, there is a critical need for reliable computational host DNA removal in metagenomic data. We find that insufficient host filtration using prior human genome references can introduce false sex biases and inadvertently permit flow-through of host-specific DNA during bioinformatic analyses, which could be exploited for individual identification. To address these issues, we introduce and benchmark three host filtration methods of varying throughput, with concomitant applications across low biomass samples such as skin and high microbial biomass datasets including fecal samples. We find that these methods are important for obtaining accurate results in low biomass samples (e.g., tissue, skin). Overall, we demonstrate that rigorous host filtration is a key component of privacy-minded analyses of patient microbiomes and provide computationally efficient pipelines for accomplishing this task on large-scale datasets.
Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-