Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
eScholarship
Open Access Publications from the University of California

UCSF

UC San Francisco Previously Published Works bannerUCSF

Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data

Abstract

As next-generation sequencing technologies produce deeper genome coverages at lower costs, there is a critical need for reliable computational host DNA removal in metagenomic data. We find that insufficient host filtration using prior human genome references can introduce false sex biases and inadvertently permit flow-through of host-specific DNA during bioinformatic analyses, which could be exploited for individual identification. To address these issues, we introduce and benchmark three host filtration methods of varying throughput, with concomitant applications across low biomass samples such as skin and high microbial biomass datasets including fecal samples. We find that these methods are important for obtaining accurate results in low biomass samples (e.g., tissue, skin). Overall, we demonstrate that rigorous host filtration is a key component of privacy-minded analyses of patient microbiomes and provide computationally efficient pipelines for accomplishing this task on large-scale datasets.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View