Search | arXiv e-print repository

Modelling spatially autocorrelated detection probabilities in spatial capture-recapture using random effects

Authors: Soumen Dey, Ehsan M. Moqanaki, Cyril Milleret, Pierre Dupont, Mahdieh Tourani, Richard Bischof

Abstract: Spatial capture-recapture (SCR) models are now widely used for estimating density from repeated individual spatial encounters. SCR accounts for the inherent spatial autocorrelation in individual detections by modelling detection probabilities as a function of distance between the detectors and individual activity centres. However, additional spatial heterogeneity in detection probability may still… ▽ More Spatial capture-recapture (SCR) models are now widely used for estimating density from repeated individual spatial encounters. SCR accounts for the inherent spatial autocorrelation in individual detections by modelling detection probabilities as a function of distance between the detectors and individual activity centres. However, additional spatial heterogeneity in detection probability may still creep in due to environmental or sampling characteristics. if unaccounted for, such variation can lead to pronounced bias in population size estimates. Using simulations, we describe and test three Bayesian SCR models that use generalized linear mixed models (GLMM) to account for latent heterogeneity in baseline detection probability across detectors using: independent random effects (RE), spatially autocorrelated random effects (SARE), and a two-group finite mixture model (FM). Overall, SARE provided the least biased population size estimates (median RB: -9 -- 6%). When spatial autocorrelation was high, SARE also performed best at predicting the spatial pattern of heterogeneity in detection probability. At intermediate levels of autocorrelation, spatially-explicit estimates of detection probability obtained with FM where more accurate than those generated by SARE and RE. In cases where the number of detections per detector is realistically low (at most 1), all GLMMs considered here may require dimension reduction of the random effects by pooling baseline detection probability parameters across neighboring detectors ("aggregation") to avoid over-parameterization. The added complexity and computational overhead associated with SCR-GLMMs may only be justified in extreme cases of spatial heterogeneity. However, even in less extreme cases, detecting and estimating spatially heterogeneous detection probability may assist in planning or adjusting monitoring schemes. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:1709.08461 [pdf, other]

Mining a Sub-Matrix of Maximal Sum

Authors: Vincent Branders, Pierre Schaus, Pierre Dupont

Abstract: Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already… ▽ More Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the original CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive. △ Less

Submitted 25 September, 2017; originally announced September 2017.

Comments: 12 pages, 1 figure, Presented at NFMCP 2017, The 6th International Workshop on New Frontiers in Mining Complex Patterns, Skopje, Macedonia, Sep 22, 2017

arXiv:1502.01493 [pdf, other]

A mixture Cox-Logistic model for feature selection from survival and classification data

Authors: Samuel Branders, Roberto D'Ambrosio, Pierre Dupont

Abstract: This paper presents an original approach for jointly fitting survival times and classifying samples into subgroups. The Coxlogit model is a generalized linear model with a common set of selected features for both tasks. Survival times and class labels are here assumed to be conditioned by a common risk score which depends on those features. Learning is then naturally expressed as maximizing the jo… ▽ More This paper presents an original approach for jointly fitting survival times and classifying samples into subgroups. The Coxlogit model is a generalized linear model with a common set of selected features for both tasks. Survival times and class labels are here assumed to be conditioned by a common risk score which depends on those features. Learning is then naturally expressed as maximizing the joint probability of subgroup labels and the ordering of survival events, conditioned to a common weight vector. The model is estimated by minimizing a regularized log-likelihood through a coordinate descent algorithm. Validation on synthetic and breast cancer data shows that the proposed approach outperforms a standard Cox model or logistic regression when both predicting the survival times and classifying new samples into subgroups. It is also better at selecting informative features for both tasks. △ Less

Submitted 5 February, 2015; originally announced February 2015.

Showing 1–3 of 3 results for author: Dupont, P