Search | arXiv e-print repository

A Scalar-on-Quantile-Function Approach for Estimating Short-term Health Effects of Environmental Exposures

Authors: Yuzi Zhang, Howard H. Chang, Joshua L. Warren, Stefanie T. Ebelt

Abstract: Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of short-term (e.g., daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive population-level exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial an… ▽ More Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of short-term (e.g., daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive population-level exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial and temporal unit of interest (e.g., within day or ZIP code). We propose a general modeling approach to incorporate within-unit exposure heterogeneity in health analyses via exposure quantile functions. Furthermore, by viewing the exposure quantile function as a functional covariate, our approach provides additional flexibility in characterizing associations at different quantile levels. We apply the proposed approach to an analysis of air pollution and emergency department (ED) visits in Atlanta over four years. The analysis utilizes daily ZIP code-level distributions of personal exposures to four traffic-related ambient air pollutants simulated from the Stochastic Human Exposure and Dose Simulator. Our analyses find that effects of carbon monoxide on respiratory and cardiovascular disease ED visits are more pronounced with changes in lower quantiles of the population-level exposure. Software for implement is provided in the R package nbRegQF. △ Less

Submitted 4 February, 2023; originally announced February 2023.

arXiv:2203.16627 [pdf, other]

A Bayesian framework for incorporating exposure uncertainty into health analyses with application to air pollution and stillbirth

Authors: Saskia Comess, Howard H. Chang, Joshua L. Warren

Abstract: Studies of the relationships between environmental exposures and adverse health outcomes often rely on a two-stage statistical modeling approach, where exposure is modeled/predicted in the first stage and used as input to a separately fit health outcome analysis in the second stage. Uncertainty in these predictions is frequently ignored, or accounted for in an overly simplistic manner, when estima… ▽ More Studies of the relationships between environmental exposures and adverse health outcomes often rely on a two-stage statistical modeling approach, where exposure is modeled/predicted in the first stage and used as input to a separately fit health outcome analysis in the second stage. Uncertainty in these predictions is frequently ignored, or accounted for in an overly simplistic manner, when estimating the associations of interest. Working in the Bayesian setting, we propose a flexible kernel density estimation (KDE) approach for fully utilizing posterior output from the first stage modeling/prediction to make accurate inference on the association between exposure and health in the second stage, derive the full conditional distributions needed for efficient model fitting, detail its connections with existing approaches, and compare its performance through simulation. Our KDE approach is shown to generally have improved performance across several settings and model comparison metrics. Using competing approaches, we investigate the association between lagged daily ambient fine particulate matter levels and stillbirth counts in New Jersey (2011-2015), observing an increase in risk with elevated exposure three days prior to delivery. The newly developed methods are available in the R package KDExp. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2109.14003 [pdf, other]

Spatial modeling of dyadic genetic relatedness data: Identifying factors associated with M. tuberculosis transmission in Moldova

Authors: Joshua L. Warren, Melanie H. Chitwood, Benjamin Sobkowiak, Valeriu Crudu, Caroline Colijn, Ted Cohen

Abstract: Understanding factors that contribute to the increased likelihood of disease transmission between two individuals is important for infection control. However, analyzing measures of genetic relatedness is complicated due to correlation arising from the presence of the same individual across multiple dyadic outcomes, potential spatial correlation caused by unmeasured transmission dynamics, and the d… ▽ More Understanding factors that contribute to the increased likelihood of disease transmission between two individuals is important for infection control. However, analyzing measures of genetic relatedness is complicated due to correlation arising from the presence of the same individual across multiple dyadic outcomes, potential spatial correlation caused by unmeasured transmission dynamics, and the distinctive distributional characteristics of some of the outcomes. We develop two novel hierarchical Bayesian spatial methods for analyzing dyadic genetic relatedness data, in the form of patristic distances and transmission probabilities, that simultaneously address each of these complications. Using individual-level spatially correlated random effect parameters, we account for multiple sources of correlation between the outcomes as well as other important features of their distribution. Through simulation, we show the limitations of existing approaches in terms of estimating key associations of interest, and the ability of the new methodology to correct for these issues across datasets with different levels of correlation. All methods are applied to Mycobacterium tuberculosis data from the Republic of Moldova where we identify previously unknown factors associated with disease transmission and, through analysis of the random effect parameters, key individuals and areas with increased transmission activity. Model comparisons show the importance of the new methodology in this setting. The methods are implemented in the R package GenePair. △ Less

Submitted 22 August, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

arXiv:2104.09730 [pdf, other]

Critical Window Variable Selection for Mixtures: Estimating the Impact of Multiple Air Pollutants on Stillbirth

Authors: Joshua L. Warren, Howard H. Chang, Lauren K. Warren, Matthew J. Strickland, Lyndsey A. Darrow, James A. Mulholland

Abstract: Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable sub-populations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend Critical Window Variable Sel… ▽ More Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable sub-populations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend Critical Window Variable Selection (CWVS) to the multipollutant setting by introducing CWVS for Mixtures (CWVSmix), a hierarchical Bayesian method that combines smoothed variable selection and temporally correlated weight parameters to (i) identify critical windows of exposure to mixtures of time-varying pollutants, (ii) estimate the time-varying relative importance of each individual pollutant and their first order interactions within the mixture, and (iii) quantify the impact of the mixtures on health. Through simulation, we show that CWVSmix offers the best balance of performance in each of these categories in comparison to competing methods. Using these approaches, we investigate the impact of exposure to multiple ambient air pollutants on the risk of stillbirth in New Jersey, 2005-2014. We find consistent elevated risk in gestational weeks 2, 16-17, and 20 for non-Hispanic Black mothers, with pollution mixtures dominated by ammonium (weeks 2, 17, 20), nitrate (weeks 2, 17), nitrogen oxides (weeks 2, 16), PM2.5 (week 2), and sulfate (week 20). The method is available in the R package CWVSmix. △ Less

Submitted 19 April, 2021; originally announced April 2021.

arXiv:1811.11038 [pdf, other]

A spatially varying change points model for monitoring glaucoma progression using visual field data

Authors: Samuel I. Berchuck, Jean-Claude Mwanza, Joshua L. Warren

Abstract: Glaucoma disease progression, as measured by visual field (VF) data, is often defined by periods of relative stability followed by an abrupt decrease in visual ability at some point in time. Determining the transition point of the disease trajectory to a more severe state is important clinically for disease management and for avoiding irreversible vision loss. Based on this, we present a unified s… ▽ More Glaucoma disease progression, as measured by visual field (VF) data, is often defined by periods of relative stability followed by an abrupt decrease in visual ability at some point in time. Determining the transition point of the disease trajectory to a more severe state is important clinically for disease management and for avoiding irreversible vision loss. Based on this, we present a unified statistical modeling framework that permits prediction of the timing and spatial location of future vision loss and informs clinical decisions regarding disease progression. The developed method incorporates anatomical information to create a biologically plausible data-generating model. We accomplish this by introducing a spatially varying coefficients model that includes spatially varying change points to detect structural shifts in both the mean and variance process of VF data across both space and time. The VF location-specific change point represents the underlying, and potentially censored, timing of true change in disease trajectory while a multivariate spatial boundary detection structure is introduced that accounts for the complex spatial connectivity of the VF and optic disc. We show that our method improves estimation and prediction of multiple aspects of disease management in comparison to existing methods through simulation and real data application. The R package spCP implements the new methodology. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: This is a preprint of an article submitted for publication in Spatial Statistics (https://www.journals.elsevier.com/spatial-statistics). The article contains 42 pages, 4 figures, 5 tables and 1 video

arXiv:1805.11636 [pdf, other]

Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method

Authors: Samuel I. Berchuck, Jean-Claude Mwanza, Joshua L. Warren

Abstract: Diagnosing glaucoma progression is critical for limiting irreversible vision loss. A common method for assessing glaucoma progression uses a longitudinal series of visual fields (VF) acquired at regular intervals. VF data are characterized by a complex spatiotemporal structure due to the data generating process and ocular anatomy. Thus, advanced statistical methods are needed to make clinical dete… ▽ More Diagnosing glaucoma progression is critical for limiting irreversible vision loss. A common method for assessing glaucoma progression uses a longitudinal series of visual fields (VF) acquired at regular intervals. VF data are characterized by a complex spatiotemporal structure due to the data generating process and ocular anatomy. Thus, advanced statistical methods are needed to make clinical determinations regarding progression status. We introduce a spatiotemporal boundary detection model that allows the underlying anatomy of the optic disc to dictate the spatial structure of the VF data across time. We show that our new method provides novel insight into vision loss that improves diagnosis of glaucoma progression using data from the Vein Pulsation Study Trial in Glaucoma and the Lions Eye Institute trial registry. Simulations are presented, showing the proposed methodology is preferred over existing spatial methods for VF data. Supplementary materials for this article are available online and the method is implemented in the R package womblR. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: This is a preprint of an article submitted for publication in the Journal of the American Statistical Association (https://www.tandfonline.com/toc/uasa20/current). The article contains 35 pages, 4 figures and 3 tables

arXiv:1803.06393 [pdf, other]

Phylogeny-based tumor subclone identification using a Bayesian feature allocation model

Authors: Li Zeng, Joshua L. Warren, Hongyu Zhao

Abstract: Tumor cells acquire different genetic alterations during the course of evolution in cancer patients. As a result of competition and selection, only a few subgroups of cells with distinct genotypes survive. These subgroups of cells are often referred to as subclones. In recent years, many statistical and computational methods have been developed to identify tumor subclones, leading to biologically… ▽ More Tumor cells acquire different genetic alterations during the course of evolution in cancer patients. As a result of competition and selection, only a few subgroups of cells with distinct genotypes survive. These subgroups of cells are often referred to as subclones. In recent years, many statistical and computational methods have been developed to identify tumor subclones, leading to biologically significant discoveries and shedding light on tumor progression, metastasis, drug resistance and other processes. However, most existing methods are either not able to infer the phylogenetic structure among subclones, or not able to incorporate copy number variations (CNV). In this article, we propose SIFA (tumor Subclone Identification by Feature Allocation), a Bayesian model which takes into account both CNV and tumor phylogeny structure to infer tumor subclones. We compare the performance of SIFA with two other commonly used methods using simulation studies with varying sequencing depth, evolutionary tree size, and tree complexity. SIFA consistently yields better results in terms of Rand Index and cellularity estimation accuracy. The usefulness of SIFA is also demonstrated through its application to whole genome sequencing (WGS) samples from four patients in a breast cancer study. △ Less

Submitted 16 March, 2018; originally announced March 2018.

Comments: 35 pages, 11 figures

arXiv:1609.02984

A Bayesian Semiparametric Factor Analysis Model for Subtype Identification

Authors: Jiehuan Sun, Joshua L. Warren, Hongyu Zhao

Abstract: Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due… ▽ More Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods. △ Less

Submitted 25 September, 2016; v1 submitted 9 September, 2016; originally announced September 2016.

Comments: This paper has been withdrawn by the author because it was submitted without consents of all authors

MSC Class: 62H30

arXiv:1609.02980

A Dirichlet Process Mixture Model for Clustering Longitudinal Gene Expression Data

Authors: Jiehuan Sun, Jose D. Herazo-Maya, Naftali Kaminski, Hongyu Zhao, Joshua L. Warren

Abstract: Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Moreover, the longitudinal gene expression data allows for intra-individual variability to be accou… ▽ More Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Moreover, the longitudinal gene expression data allows for intra-individual variability to be accounted for when grouping patients. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel subgroup identification method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to distinguish burn patients from trauma patients and identify interesting subgroups in trauma patients. △ Less

Submitted 25 September, 2016; v1 submitted 9 September, 2016; originally announced September 2016.

Comments: This paper has been withdrawn by the author because it was submitted without consents of all authors

MSC Class: 62G05

Showing 1–9 of 9 results for author: Warren, J L