-
Clustering functional data with measurement errors: a simulation-based approach
Authors:
Tingyu Zhu,
Lan Xue,
Carmen Tekwe,
Keith Diaz,
Mark Benden,
Roger Zoh
Abstract:
Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data that are contaminated with measurement errors arising from imprecise instruments, sampling errors, or other sources. These errors can significantly distort the in…
▽ More
Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data that are contaminated with measurement errors arising from imprecise instruments, sampling errors, or other sources. These errors can significantly distort the inherent data structure, resulting in erroneous clustering outcomes. In this paper, we propose a simulation-based approach designed to mitigate the impact of measurement errors. Our proposed method estimates the distribution of functional measurement errors through repeated measurements. Subsequently, the clustering algorithm is applied to simulated data generated from the conditional distribution of the unobserved true functional data given the observed contaminated functional data, accounting for the adjustments made to rectify measurement errors. We illustrate through simulations show that the proposed method has improved numerical performance than the naive methods that neglect such errors. Our proposed method was applied to a childhood obesity study, giving more reliable clustering results
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Adjusting for bias due to measurement error in functional quantile regression models with error-prone functional and scalar covariates
Authors:
Xiwei Chen,
Yuanyuan Luan,
Roger S. Zoh,
Lan Xue,
Sneha Jadhav,
Carmen D. Tekwe
Abstract:
Wearable devices enable the continuous monitoring of physical activity (PA) but generate complex functional data with poorly characterized errors. Most work on functional data views the data as smooth, latent curves obtained at discrete time intervals with some random noise with mean zero and constant variance. Viewing this noise as homoscedastic and independent ignores potential serial correlatio…
▽ More
Wearable devices enable the continuous monitoring of physical activity (PA) but generate complex functional data with poorly characterized errors. Most work on functional data views the data as smooth, latent curves obtained at discrete time intervals with some random noise with mean zero and constant variance. Viewing this noise as homoscedastic and independent ignores potential serial correlations. Our preliminary studies indicate that failing to account for these serial correlations can bias estimations. In dietary assessments, epidemiologists often use self-reported measures based on food frequency questionnaires that are prone to recall bias. With the increased availability of complex, high-dimensional functional, and scalar biomedical data potentially prone to measurement errors, it is necessary to adjust for biases induced by these errors to permit accurate analyses in various regression settings. However, there has been limited work to address measurement errors in functional and scalar covariates in the context of quantile regression. Therefore, we developed new statistical methods based on simulation extrapolation (SIMEX) and mixed effects regression with repeated measures to correct for measurement error biases in this context. We conducted simulation studies to establish the finite sample properties of our new methods. The methods are illustrated through application to a real data set.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Scalable regression calibration approaches to correcting measurement error in multi-level generalized functional linear regression models with heteroscedastic measurement errors
Authors:
Yuanyuan Luan,
Roger S. Zoh,
Erjia Cui,
Xue Lan,
Sneha Jadhav,
Carmen D. Tekwe
Abstract:
Wearable devices permit the continuous monitoring of biological processes, such as blood glucose metabolism, and behavior, such as sleep quality and physical activity. The continuous monitoring often occurs in epochs of 60 seconds over multiple days, resulting in high dimensional longitudinal curves that are best described and analyzed as functional data. From this perspective, the functional data…
▽ More
Wearable devices permit the continuous monitoring of biological processes, such as blood glucose metabolism, and behavior, such as sleep quality and physical activity. The continuous monitoring often occurs in epochs of 60 seconds over multiple days, resulting in high dimensional longitudinal curves that are best described and analyzed as functional data. From this perspective, the functional data are smooth, latent functions obtained at discrete time intervals and prone to homoscedastic white noise. However, the assumption of homoscedastic errors might not be appropriate in this setting because the devices collect the data serially. While researchers have previously addressed measurement error in scalar covariates prone to errors, less work has been done on correcting measurement error in high dimensional longitudinal curves prone to heteroscedastic errors. We present two new methods for correcting measurement error in longitudinal functional curves prone to complex measurement error structures in multi-level generalized functional linear regression models. These methods are based on two-stage scalable regression calibration. We assume that the distribution of the scalar responses and the surrogate measures prone to heteroscedastic errors both belong in the exponential family and that the measurement errors follow Gaussian processes. In simulations and sensitivity analyses, we established some finite sample properties of these methods. In our simulations, both regression calibration methods for correcting measurement error performed better than estimators based on averaging the longitudinal functional data and using observations from a single day. We also applied the methods to assess the relationship between physical activity and type 2 diabetes in community dwelling adults in the United States who participated in the National Health and Nutrition Examination Survey.
△ Less
Submitted 20 April, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Generalized functional linear regression models with a mixture of complex function-valued and scalar-valued covariates prone to measurement error
Authors:
Yuanyuan Luan,
Roger S. Zoh,
Sneha Jadhav,
Lan Xue,
Carmen D. Tekwe
Abstract:
While extensive work has been done to correct for biases due to measurement error in scalar-valued covariates prone to errors in generalized linear regression models, limited work has been done to address biases associated with functional covariates prone to errors or the combination of scalar and functional covariates prone to errors in these models. We propose Simulation Extrapolation (SIMEX) an…
▽ More
While extensive work has been done to correct for biases due to measurement error in scalar-valued covariates prone to errors in generalized linear regression models, limited work has been done to address biases associated with functional covariates prone to errors or the combination of scalar and functional covariates prone to errors in these models. We propose Simulation Extrapolation (SIMEX) and Regression Calibration approaches to correct measurement errors associated with a mixture of functional and scalar covariates prone to classical measurement errors in generalized functional linear regression. The simulation extrapolation method is developed to handle the functional and scalar covariates prone to errors. We also develop methods based on regression calibration extended to our current measurement error settings. Extensive simulation studies are conducted to assess the finite sample performance of our developed methods. The methods are applied to the 2011-2014 cycles of the National Health and Examination Survey data to assess the relationship between physical activity and total caloric intake with type 2 diabetes among community-dwelling adults living in the United States. We treat the device-based measures of physical activity as error-prone functional covariates prone to complex arbitrary heteroscedastic errors, while the total caloric intake is considered a scalar-valued covariate prone to error. We also examine the characteristics of observed measurement errors in device-based physical activity by important demographic subgroups including age, sex, and race.
△ Less
Submitted 12 May, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
A Bayesian Semi-Parametric Scalar-On-Function Quantile Regression with Measurement Error using the GAL
Authors:
Roger S. Zoh,
Annie Yu,
Carmen Tekwe
Abstract:
Quantile regression provides a consistent approach to investigating the association between covariates and various aspects of the distribution of the response beyond the mean. When the regression covariates are measured with errors, measurement error (ME) adjustment steps are needed for valid inference. This is true for both scalar and functional covariates. Here, we propose extending the Bayesian…
▽ More
Quantile regression provides a consistent approach to investigating the association between covariates and various aspects of the distribution of the response beyond the mean. When the regression covariates are measured with errors, measurement error (ME) adjustment steps are needed for valid inference. This is true for both scalar and functional covariates. Here, we propose extending the Bayesian measurement error and Bayesian quantile regression literature to allow for available covariates prone to potential complex measurement errors. Our approach uses the Generalized Asymmetric Laplace (GAL) distribution as a working likelihood. The family of GAL distribution has recently emerged as a more flexible distribution family in the Bayesian quantile regression modeling compared to their Asymmetric Laplace (AL) counterpart. We then compared and contrasted two approaches in our ME-adjusted steps through a battery of simulation scenarios. Finally, we apply our approach to the analysis of an NHANES dataset 2013-2014 to model quantiles of Body mass index (BMI) as a function of minute-level device-based physical activity in a cohort of an adult 50 years and above.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
A fully Bayesian semi-parametric scalar-on-function regression (SoFR) with measurement error using instrumental variables
Authors:
Roger S. Zoh,
Yuanyuan Luan,
Carmen Tekwe
Abstract:
Wearable devices such as the ActiGraph are now commonly used in health studies to monitor or track physical activity. This trend aligns well with the growing need to accurately assess the effects of physical activity on health outcomes such as obesity. When accessing the association between these device-based physical activity measures with health outcomes such as body mass index, the device-based…
▽ More
Wearable devices such as the ActiGraph are now commonly used in health studies to monitor or track physical activity. This trend aligns well with the growing need to accurately assess the effects of physical activity on health outcomes such as obesity. When accessing the association between these device-based physical activity measures with health outcomes such as body mass index, the device-based data is considered functions, while the outcome is a scalar-valued. The regression model applied in these settings is the scalar-on-function regression (SoFR). Most estimation approaches in SoFR assume that the functional covariates are precisely observed, or the measurement errors are considered random errors. Violation of this assumption can lead to both under-estimation of the model parameters and sub-optimal analysis. The literature on a measurement corrected approach in SoFR is sparse in the non-Bayesian literature and virtually non-existent in the Bayesian literature. This paper considers a fully nonparametric Bayesian measurement error corrected SoFR model that relaxes all the constraining assumptions often made in these models. Our estimation relies on an instrumental variable (IV) to identify the measurement error model. Finally, we introduce an IV quality scalar parameter that is jointly estimated along with all model parameters. Our method is easy to implement, and we demonstrate its finite sample properties through an extensive simulation. Finally, the developed methods are applied to the National Health and Examination Survey to assess the relationship between wearable-device-based measures of physical activity and body mass index among adults living in the United States.
△ Less
Submitted 9 November, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
A Function-Based Approach to Model the Measurement Error in Wearable Devices
Authors:
Sneha Jadhav,
Carmen D. Tekwe,
Yuanyuan Luan
Abstract:
Physical activity (PA) is an important risk factor for many health outcomes. Wearable-devices such as accelerometers are increasingly used in biomedical studies to understand the associations between PA and health outcomes. Statistical analyses involving accelerometer data are challenging due to the following three characteristics: (i) high-dimensionality, (ii) temporal dependence, and (iii) measu…
▽ More
Physical activity (PA) is an important risk factor for many health outcomes. Wearable-devices such as accelerometers are increasingly used in biomedical studies to understand the associations between PA and health outcomes. Statistical analyses involving accelerometer data are challenging due to the following three characteristics: (i) high-dimensionality, (ii) temporal dependence, and (iii) measurement error. To address these challenges we treat accelerometer-based measures of physical activity as a single function-valued covariate prone to measurement error. Specifically, in order to determine the relationship between PA and a health outcome of interest, we propose a regression model with a functional covariate that accounts for measurement error. Using regression calibration, we develop a two-step estimation method for the model parameters and establish their consistency. A test is also proposed to test the significance of the estimated model parameters. Simulation studies are conducted to compare the proposed methods with existing alternative approaches under varying scenarios. Finally, the developed methods are used to assess the relationship between PA intensity and BMI obtained from the National Health and Nutrition Examination Survey data.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.