We present a novel statistical framework for detecting pre-determined shape classes in 2D clutter... more We present a novel statistical framework for detecting pre-determined shape classes in 2D cluttered point clouds, that are in turn extracted from images. In this model-based approach, we use a 1D Poisson process for sampling points on shapes, a 2D Poisson process for points from background clutter, and an additive Gaussian model for noise. Combining these with a past stochastic models on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter using both simulated and real data.
Selecting important spatial-dependent variables under the nonhomogeneous spatial Poisson process ... more Selecting important spatial-dependent variables under the nonhomogeneous spatial Poisson process model is an important topic of great current interest. In this paper, we use the Deviance Information Criterion (DIC) and Logarithm of the Pseudo Marginal Likelihood (LPML) for Bayesian variable selection under the nonhomogeneous spatial Poisson process model. We further derive the new Monte Carlo estimation formula for LPML in the spatial Poisson process setting. Extensive simulation studies are carried out to evaluate the empirical performance of the proposed criteria. The proposed methodology is further applied to the analysis of two large data sets, the Earthquake Hazards Program of United States Geological Survey (USGS) earthquake data and the Forest of Barro Colorado Island (BCI) data.
Abstract Age-dependent extinction is an observation with important biological implications. Van V... more Abstract Age-dependent extinction is an observation with important biological implications. Van Valen's Red Queen hypothesis triggered three decades of research testing its primary implication: that age is independent of extinction. In contrast to this, later studies with species-level data have indicated the possible presence of age dependence. Since the formulation of the Red Queen hypothesis, more powerful tests of survivorship models have been developed. This is the first report of the application of the Cox Proportional Hazards model to paleontological data. Planktonic foraminiferal morphospecies allow the taxonomic and precise stratigraphic resolution necessary for the Cox model. As a whole, planktonic foraminiferal morphospecies clearly show age-dependent extinction. In particular, the effect is attributable to the presence of shorter-ranged species (range < 4 myr) following extinction events. These shorter-ranged species also possess tests with unique morphological architecture. The morphological differences are probably epiphenomena of underlying developmental and heterochronic processes of shorter-ranged species that survived various extinction events. Extinction survivors carry developmental and morphological characteristics into postextinction recovery times, and this sets them apart from species populations established independently of extinction events.
This study proposed a multilevel logistic regression model to evaluate a source of DIF. The model... more This study proposed a multilevel logistic regression model to evaluate a source of DIF. The model accounts for the three levels nested structure of the data and combines results of logistic regression analysesto identify level-3 unit characteristic variables that explain a DIF variation. A simulation study is presented to assess the adequacy of the proposed models. The parameters of the proposed models were estimated by usinga Bayesian approach implemented by the WinBUGS 1.4.
Let C w denote the number ofm:wclumps amongNrandom points uniformly distributed in the interval (... more Let C w denote the number ofm:wclumps amongNrandom points uniformly distributed in the interval (01]. (We say that anm:wclump exists whenmpoints fall within an interval of lengthw.) The previous chapter described how to compute the lower-order moments ofC w . In the present chapter, we discuss ways these moments can be used to obtain bounds and approximations for the distribution
For modeling the distribution of plant species in terms of climate covariates, we consider an aut... more For modeling the distribution of plant species in terms of climate covariates, we consider an autologistic regression model for spatial binary data on a regularly spaced lattice. This model belongs to the class of autologistic models introduced by Besag (1974). Three estimation methods, the coding method, maximum pseudolikelihood method and Markov chain Monte Carlo method are studied and comparedvia simulation
Consider the order statistics fromNi.i.d. random variables uniformly distributed on the interval ... more Consider the order statistics fromNi.i.d. random variables uniformly distributed on the interval (0,1]. We present a general method for computing probabilities involving differences of the order statistics or linear combinations of the spacings between the order statistics. This method is based on repeated use of a basic recursion to break up the joint distribution of linear combinations of spacings into
Journal of the American Statistical Association, 1997
Let X 1 ; X 2 ; : : : ; X n be randomly distributed points on the unit interval. LetN x;x+d be th... more Let X 1 ; X 2 ; : : : ; X n be randomly distributed points on the unit interval. LetN x;x+d be the number of these points contained in the interval (x; x + d). Thescan statistic N d is defined as the maximum number of points in a window oflength d, that is, N d = sup x N x;x+d . This statistic is used to test for the presenceof non-random clustering. We say that m points form an m : d clump if thesepoints are all contained in some interval of length d. Let Y denote the number ofm : d...
Communications in Statistics - Simulation and Computation, 2011
We study a weighted least squares estimator for Aalen's additive risk model with right-censor... more We study a weighted least squares estimator for Aalen's additive risk model with right-censored survival data which allows for a very flexible handling of covariates. We divide the follow-up period into intervals and assume a constant hazard rate in each interval. The model is motivated as a piecewise approximation of a hazard function composed of three parts: arbitrary nonparametric functions for some covariate effects, smoothly varying functions for others, and known (or constant) functions for yet others. The proposed estimator is an extension of the grouped data version of the Huffer and McKeague (1991) estimator. For our model, since the number of parameters is finite (although large), conventional approaches (such as maximum likelihood) are easy to formulate and implement. The approach is illustrated by simulations, and is compared to the previous studies. The method is also applied to the Framingham study data.
We define a chi-squared statistic for p-dimensional data as follows. First, we transform the data... more We define a chi-squared statistic for p-dimensional data as follows. First, we transform the data to remove the correlations between the p variables.
The problem of 3D chromosome structure inference from Hi-C datasets is important and challenging.... more The problem of 3D chromosome structure inference from Hi-C datasets is important and challenging. While bulk Hi-C datasets contain contact information derived from millions of cells, and can capture major structural features shared by the majority of cells in the sample, they do not provide information about local variability between cells. Single cell Hi-C can overcome this problem, but contact matrices are generally very sparse, making structural inference more problematic. We have developed a Bayesian multiscale approach, named SIMBA3D, to infer 3D structures of chromosomes from single cell Hi-C while including the bulk Hi-C data and some regularization terms as a prior. We study the landscape of solutions for each single-cell Hi-C dataset as a function of prior strength and demonstrate clustering of solutions using data from the same cell.
A common problem in genomics is to test for associations between two or more genomic features, ty... more A common problem in genomics is to test for associations between two or more genomic features, typically represented as intervals interspersed across the genome. Existing methodologies can test for significant pairwise associations between two genomic intervals; however, they cannot test for associations involving multiple sets of intervals. This limits our ability to uncover more complex, yet biologically important associations between multiple sets of genomic features. We introduce GINOM (Genomic INterval Overlap Model), a new method that enables testing of significant associations between multiple genomic features. We demonstrate GINOM's ability to identify higher-order associations with both simulated and real data. In particular, we used GINOM to explore L1 retrotransposable element insertion bias in lung cancer and found a significant pairwise association between L1 insertions and heterochromatic marks. Unlike other methods, GINOM also detected an association between L1 in...
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010
ABSTRACT We present a fully statistical framework for detecting pre-determined shape classes in 2... more ABSTRACT We present a fully statistical framework for detecting pre-determined shape classes in 2D clouds of primitives (points, edges, and arcs), which are in turn extracted from images. An important goal is to provide a likelihood, and thus a confidence, of finding a shape class in a given data. This requires a model-based approach. We use a composite Poisson process: 1D Poisson process for primitives belonging to shapes and a 2D Poisson process for primitives belonging to clutter. An additive Gaussian model is assumed for noise in shape primitives. Combining these with a past stochastic model on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter using both simulated and real data.
We present a novel statistical framework for detecting pre-determined shape classes in 2D clutter... more We present a novel statistical framework for detecting pre-determined shape classes in 2D cluttered point clouds, that are in turn extracted from images. In this model-based approach, we use a 1D Poisson process for sampling points on shapes, a 2D Poisson process for points from background clutter, and an additive Gaussian model for noise. Combining these with a past stochastic models on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter using both simulated and real data.
Selecting important spatial-dependent variables under the nonhomogeneous spatial Poisson process ... more Selecting important spatial-dependent variables under the nonhomogeneous spatial Poisson process model is an important topic of great current interest. In this paper, we use the Deviance Information Criterion (DIC) and Logarithm of the Pseudo Marginal Likelihood (LPML) for Bayesian variable selection under the nonhomogeneous spatial Poisson process model. We further derive the new Monte Carlo estimation formula for LPML in the spatial Poisson process setting. Extensive simulation studies are carried out to evaluate the empirical performance of the proposed criteria. The proposed methodology is further applied to the analysis of two large data sets, the Earthquake Hazards Program of United States Geological Survey (USGS) earthquake data and the Forest of Barro Colorado Island (BCI) data.
Abstract Age-dependent extinction is an observation with important biological implications. Van V... more Abstract Age-dependent extinction is an observation with important biological implications. Van Valen's Red Queen hypothesis triggered three decades of research testing its primary implication: that age is independent of extinction. In contrast to this, later studies with species-level data have indicated the possible presence of age dependence. Since the formulation of the Red Queen hypothesis, more powerful tests of survivorship models have been developed. This is the first report of the application of the Cox Proportional Hazards model to paleontological data. Planktonic foraminiferal morphospecies allow the taxonomic and precise stratigraphic resolution necessary for the Cox model. As a whole, planktonic foraminiferal morphospecies clearly show age-dependent extinction. In particular, the effect is attributable to the presence of shorter-ranged species (range < 4 myr) following extinction events. These shorter-ranged species also possess tests with unique morphological architecture. The morphological differences are probably epiphenomena of underlying developmental and heterochronic processes of shorter-ranged species that survived various extinction events. Extinction survivors carry developmental and morphological characteristics into postextinction recovery times, and this sets them apart from species populations established independently of extinction events.
This study proposed a multilevel logistic regression model to evaluate a source of DIF. The model... more This study proposed a multilevel logistic regression model to evaluate a source of DIF. The model accounts for the three levels nested structure of the data and combines results of logistic regression analysesto identify level-3 unit characteristic variables that explain a DIF variation. A simulation study is presented to assess the adequacy of the proposed models. The parameters of the proposed models were estimated by usinga Bayesian approach implemented by the WinBUGS 1.4.
Let C w denote the number ofm:wclumps amongNrandom points uniformly distributed in the interval (... more Let C w denote the number ofm:wclumps amongNrandom points uniformly distributed in the interval (01]. (We say that anm:wclump exists whenmpoints fall within an interval of lengthw.) The previous chapter described how to compute the lower-order moments ofC w . In the present chapter, we discuss ways these moments can be used to obtain bounds and approximations for the distribution
For modeling the distribution of plant species in terms of climate covariates, we consider an aut... more For modeling the distribution of plant species in terms of climate covariates, we consider an autologistic regression model for spatial binary data on a regularly spaced lattice. This model belongs to the class of autologistic models introduced by Besag (1974). Three estimation methods, the coding method, maximum pseudolikelihood method and Markov chain Monte Carlo method are studied and comparedvia simulation
Consider the order statistics fromNi.i.d. random variables uniformly distributed on the interval ... more Consider the order statistics fromNi.i.d. random variables uniformly distributed on the interval (0,1]. We present a general method for computing probabilities involving differences of the order statistics or linear combinations of the spacings between the order statistics. This method is based on repeated use of a basic recursion to break up the joint distribution of linear combinations of spacings into
Journal of the American Statistical Association, 1997
Let X 1 ; X 2 ; : : : ; X n be randomly distributed points on the unit interval. LetN x;x+d be th... more Let X 1 ; X 2 ; : : : ; X n be randomly distributed points on the unit interval. LetN x;x+d be the number of these points contained in the interval (x; x + d). Thescan statistic N d is defined as the maximum number of points in a window oflength d, that is, N d = sup x N x;x+d . This statistic is used to test for the presenceof non-random clustering. We say that m points form an m : d clump if thesepoints are all contained in some interval of length d. Let Y denote the number ofm : d...
Communications in Statistics - Simulation and Computation, 2011
We study a weighted least squares estimator for Aalen's additive risk model with right-censor... more We study a weighted least squares estimator for Aalen's additive risk model with right-censored survival data which allows for a very flexible handling of covariates. We divide the follow-up period into intervals and assume a constant hazard rate in each interval. The model is motivated as a piecewise approximation of a hazard function composed of three parts: arbitrary nonparametric functions for some covariate effects, smoothly varying functions for others, and known (or constant) functions for yet others. The proposed estimator is an extension of the grouped data version of the Huffer and McKeague (1991) estimator. For our model, since the number of parameters is finite (although large), conventional approaches (such as maximum likelihood) are easy to formulate and implement. The approach is illustrated by simulations, and is compared to the previous studies. The method is also applied to the Framingham study data.
We define a chi-squared statistic for p-dimensional data as follows. First, we transform the data... more We define a chi-squared statistic for p-dimensional data as follows. First, we transform the data to remove the correlations between the p variables.
The problem of 3D chromosome structure inference from Hi-C datasets is important and challenging.... more The problem of 3D chromosome structure inference from Hi-C datasets is important and challenging. While bulk Hi-C datasets contain contact information derived from millions of cells, and can capture major structural features shared by the majority of cells in the sample, they do not provide information about local variability between cells. Single cell Hi-C can overcome this problem, but contact matrices are generally very sparse, making structural inference more problematic. We have developed a Bayesian multiscale approach, named SIMBA3D, to infer 3D structures of chromosomes from single cell Hi-C while including the bulk Hi-C data and some regularization terms as a prior. We study the landscape of solutions for each single-cell Hi-C dataset as a function of prior strength and demonstrate clustering of solutions using data from the same cell.
A common problem in genomics is to test for associations between two or more genomic features, ty... more A common problem in genomics is to test for associations between two or more genomic features, typically represented as intervals interspersed across the genome. Existing methodologies can test for significant pairwise associations between two genomic intervals; however, they cannot test for associations involving multiple sets of intervals. This limits our ability to uncover more complex, yet biologically important associations between multiple sets of genomic features. We introduce GINOM (Genomic INterval Overlap Model), a new method that enables testing of significant associations between multiple genomic features. We demonstrate GINOM's ability to identify higher-order associations with both simulated and real data. In particular, we used GINOM to explore L1 retrotransposable element insertion bias in lung cancer and found a significant pairwise association between L1 insertions and heterochromatic marks. Unlike other methods, GINOM also detected an association between L1 in...
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010
ABSTRACT We present a fully statistical framework for detecting pre-determined shape classes in 2... more ABSTRACT We present a fully statistical framework for detecting pre-determined shape classes in 2D clouds of primitives (points, edges, and arcs), which are in turn extracted from images. An important goal is to provide a likelihood, and thus a confidence, of finding a shape class in a given data. This requires a model-based approach. We use a composite Poisson process: 1D Poisson process for primitives belonging to shapes and a 2D Poisson process for primitives belonging to clutter. An additive Gaussian model is assumed for noise in shape primitives. Combining these with a past stochastic model on shapes of continuous 2D contours, and optimization over unknown pose and scale, we develop a generalized likelihood ratio test for shape detection. We demonstrate the efficiency of this method and its robustness to clutter using both simulated and real data.
Uploads
Papers by Fred Huffer