WeproposetheuseofarobustcovarianceestimatorbasedonmultivariateWinsorizationinthecontextoftheTarr–... more WeproposetheuseofarobustcovarianceestimatorbasedonmultivariateWinsorizationinthecontextoftheTarr–Müller–WeberframeworkforsparseestimationoftheprecisionmatrixofaGaussiangraphicalmodel.LikewiseCroux–Öllerer’sprecisionmatrixestimator,ourproposedestimatorattainsthemaximumfinite-samplebreakdownpointof0.5undercellwisecontamination.WeconductanextensiveMonteCarlosimulationstudytoassesstheperformanceofoursandthecurrentlyexistingproposals.Wefindthatourshasacompetitivebehavior,regardingtheestimationoftheprecisionmatrixandtherecoveryofthegraph.Wedemonstratetheusefulnessoftheproposedmethodologyinarealapplicationtobreastcancerdata.
Journal of the Royal Statistical Society Series A: Statistics in Society, 2018
SummaryAdministrative data are becoming increasingly important. They are typically the side effec... more SummaryAdministrative data are becoming increasingly important. They are typically the side effect of some operational exercise and are often seen as having significant advantages over alternative sources of data. Although it is true that such data have merits, statisticians should approach the analysis of such data with the same cautious and critical eye as they approach the analysis of data from any other source. The paper identifies some statistical challenges, with the aim of stimulating debate about and improving the analysis of administrative data, and encouraging methodology researchers to explore some of the important statistical problems which arise with such data.
Proceedings of the 40th International Conference on Software Engineering, 2018
In goal-oriented requirements engineering approaches, conflict analysis has been proposed as an a... more In goal-oriented requirements engineering approaches, conflict analysis has been proposed as an abstraction for risk analysis. Intuitively, given a set of expected goals to be achieved by the system-tobe, a conflict represents a subtle situation that makes goals diverge, i.e., not be satisfiable as a whole. Conflict analysis is typically driven by the identify-assess-control cycle, aimed at identifying, assessing and resolving conflicts that may obstruct the satisfaction of the expected goals. In particular, the assessment step is concerned with evaluating how likely the identified conflicts are, and how likely and severe are their consequences. So far, existing assessment approaches restrict their analysis to obstacles (conflicts that prevent the satisfaction of a single goal), and assume that certain probabilistic information on the domain is provided, that needs to be previously elicited from experienced users, statistical data or simulations. In this paper, we present a novel automated approach to assess how likely a conflict is, that applies to general conflicts (not only obstacles) without requiring probabilistic information on the domain. Intuitively, given the LTL formulation of the domain and of a set of goals to be achieved, we compute goal conflicts, and exploit string model counting techniques to estimate the likelihood of the occurrence of the corresponding conflicting situations and the severity in which these affect the satisfaction of the goals. This information can then be used to prioritize conflicts to be resolved, and suggest which goals to drive attention to for refinements. CCS CONCEPTS • Software and its engineering → Requirements analysis; Risk management; • Theory of computation → Modal and temporal logics; Regular languages;
In this paper, a new method for supervised classification of hyperspectral images is proposed for... more In this paper, a new method for supervised classification of hyperspectral images is proposed for the case in which the size of the training sample is small. It consists of replacing in the Mahalanobis distance the maximum likelihood estimator of the precision matrix by a sparse estimator. The method is compared with two other existing versions of \textit{LDA} sparse, both in real and simulated images.
We introduce and compare several robust procedures for bandwidth selection when estimating the va... more We introduce and compare several robust procedures for bandwidth selection when estimating the variance function. These bandwidth selectors are to be used in combination with the robust scale estimates introduced by Boente et al. (2010a). We consider two different robust cross-validation strategies combined with two ways for measuring the cross-validation prediction error. The different proposals are compared with non robust alternatives using Monte Carlo simulation. We also derive some asymptotic results to investigate the large sample performance of the corresponding robust data-driven scale estimators.
We propose the use of a robust covariance estimator based on multivariate Winsorization in the co... more We propose the use of a robust covariance estimator based on multivariate Winsorization in the context of the Tarr–Müller–Weber framework for sparse estimation of the precision matrix of a Gaussian graphical model. Likewise Croux–Öllerer’s precision matrix estimator, our proposed estimator attains the maximum finite sample breakdown point of 0.5 under cellwise contamination. We conduct an extensive Monte Carlo simulation study to assess the performance of ours and the currently existing proposals. We find that ours has a competitive behavior, regarding the the estimation of the precision matrix and the recovery of the graph. We demonstrate the usefulness of the proposed methodology in a real application to breast
We propose the use of a robust covariance estimator based on multivariate Winsorization in the co... more We propose the use of a robust covariance estimator based on multivariate Winsorization in the context of the Tarr-Müller-Weber framework for sparse estimation of the precision matrix of a Gaussian graphical model. Likewise Croux-Öllerer's precision matrix estimator, our proposed estimator attains the maximum finite sample breakdown point of 0.5 under cellwise contamination. We conduct an extensive Monte Carlo simulation study to assess the performance of ours and the currently existing proposals. We find that ours has a competitive behavior, regarding the the estimation of the precision matrix and the recovery of the graph. We demonstrate the usefulness of the proposed methodology in a real application to breast cancer data.
Journal of Data Science, Statistics, and Visualisation
We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit... more We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit the relation between the partial correlation coefficients and the distribution of the prediction errors, and parametrize the model in terms of the Pearson correlation coefficients between the prediction errors of the nodes’ best linear predictors. We propose a novel stepwise algorithm for detecting pairs of conditionally dependent variables. We compare the proposed algorithm with existing methods including graphical lasso (Glasso), constrained `l1-minimization(CLIME) and equivalent partial correlation (EPC), via simulation studies and real life applications. In our simulation study we consider several model settings and report the results using different performance measures that look at desirable features of the recovered graph.
In this work we study the asymptotic behavior of a robust class of estimators of the coefficient ... more In this work we study the asymptotic behavior of a robust class of estimators of the coefficient of a AR-2D process. We establish the precise conditions for the consistency and asymptotic normality of the RA estimator. The AR-2D model has many applications in image modeling and statistical image processing, therefore the relevance of knowing such properties. The adequacy of the
Journal of Statistical Planning and Inference, 2008
When the data used to fit a nonparametric regression model are contaminated with outliers, we nee... more When the data used to fit a nonparametric regression model are contaminated with outliers, we need to use a robust estimator of scale in order to make robust estimation of the regression function possible. We develop a family of M-estimators of scale constructed from consecutive differences of regression responses. Estimators in our family robustify the estimator proposed by Rice [1984. Bandwidth choice for nonparametric regression. Ann. Statist. 12, 1215–1230]. Under appropriate conditions, we establish the weak consistency and asymptotic normality of all estimators in our family. Estimators in our family vary in terms of their robustness properties. We quantify the robustness of each estimator via the maxbias. We use this measure as a basis for deriving the asymptotic breakdown point of the estimator. Our theoretical results allow us to specify conditions for estimators in our family to achieve a maximum asymptotic breakdown point of 12. We conduct a simulation study to compare the finite sample performance of our preferred M-estimator with that of three other estimators.
When the data used to fit an heteroscedastic nonparametric regression model are contaminated with... more When the data used to fit an heteroscedastic nonparametric regression model are contaminated with outliers, robust estimators of the scale function are needed in order to obtain robust estimators of the regression function and to construct robust confidence bands. In this paper, local M-estimators of the scale function based on consecutive differences of the responses, for fixed designs are considered.
WeproposetheuseofarobustcovarianceestimatorbasedonmultivariateWinsorizationinthecontextoftheTarr–... more WeproposetheuseofarobustcovarianceestimatorbasedonmultivariateWinsorizationinthecontextoftheTarr–Müller–WeberframeworkforsparseestimationoftheprecisionmatrixofaGaussiangraphicalmodel.LikewiseCroux–Öllerer’sprecisionmatrixestimator,ourproposedestimatorattainsthemaximumfinite-samplebreakdownpointof0.5undercellwisecontamination.WeconductanextensiveMonteCarlosimulationstudytoassesstheperformanceofoursandthecurrentlyexistingproposals.Wefindthatourshasacompetitivebehavior,regardingtheestimationoftheprecisionmatrixandtherecoveryofthegraph.Wedemonstratetheusefulnessoftheproposedmethodologyinarealapplicationtobreastcancerdata.
Journal of the Royal Statistical Society Series A: Statistics in Society, 2018
SummaryAdministrative data are becoming increasingly important. They are typically the side effec... more SummaryAdministrative data are becoming increasingly important. They are typically the side effect of some operational exercise and are often seen as having significant advantages over alternative sources of data. Although it is true that such data have merits, statisticians should approach the analysis of such data with the same cautious and critical eye as they approach the analysis of data from any other source. The paper identifies some statistical challenges, with the aim of stimulating debate about and improving the analysis of administrative data, and encouraging methodology researchers to explore some of the important statistical problems which arise with such data.
Proceedings of the 40th International Conference on Software Engineering, 2018
In goal-oriented requirements engineering approaches, conflict analysis has been proposed as an a... more In goal-oriented requirements engineering approaches, conflict analysis has been proposed as an abstraction for risk analysis. Intuitively, given a set of expected goals to be achieved by the system-tobe, a conflict represents a subtle situation that makes goals diverge, i.e., not be satisfiable as a whole. Conflict analysis is typically driven by the identify-assess-control cycle, aimed at identifying, assessing and resolving conflicts that may obstruct the satisfaction of the expected goals. In particular, the assessment step is concerned with evaluating how likely the identified conflicts are, and how likely and severe are their consequences. So far, existing assessment approaches restrict their analysis to obstacles (conflicts that prevent the satisfaction of a single goal), and assume that certain probabilistic information on the domain is provided, that needs to be previously elicited from experienced users, statistical data or simulations. In this paper, we present a novel automated approach to assess how likely a conflict is, that applies to general conflicts (not only obstacles) without requiring probabilistic information on the domain. Intuitively, given the LTL formulation of the domain and of a set of goals to be achieved, we compute goal conflicts, and exploit string model counting techniques to estimate the likelihood of the occurrence of the corresponding conflicting situations and the severity in which these affect the satisfaction of the goals. This information can then be used to prioritize conflicts to be resolved, and suggest which goals to drive attention to for refinements. CCS CONCEPTS • Software and its engineering → Requirements analysis; Risk management; • Theory of computation → Modal and temporal logics; Regular languages;
In this paper, a new method for supervised classification of hyperspectral images is proposed for... more In this paper, a new method for supervised classification of hyperspectral images is proposed for the case in which the size of the training sample is small. It consists of replacing in the Mahalanobis distance the maximum likelihood estimator of the precision matrix by a sparse estimator. The method is compared with two other existing versions of \textit{LDA} sparse, both in real and simulated images.
We introduce and compare several robust procedures for bandwidth selection when estimating the va... more We introduce and compare several robust procedures for bandwidth selection when estimating the variance function. These bandwidth selectors are to be used in combination with the robust scale estimates introduced by Boente et al. (2010a). We consider two different robust cross-validation strategies combined with two ways for measuring the cross-validation prediction error. The different proposals are compared with non robust alternatives using Monte Carlo simulation. We also derive some asymptotic results to investigate the large sample performance of the corresponding robust data-driven scale estimators.
We propose the use of a robust covariance estimator based on multivariate Winsorization in the co... more We propose the use of a robust covariance estimator based on multivariate Winsorization in the context of the Tarr–Müller–Weber framework for sparse estimation of the precision matrix of a Gaussian graphical model. Likewise Croux–Öllerer’s precision matrix estimator, our proposed estimator attains the maximum finite sample breakdown point of 0.5 under cellwise contamination. We conduct an extensive Monte Carlo simulation study to assess the performance of ours and the currently existing proposals. We find that ours has a competitive behavior, regarding the the estimation of the precision matrix and the recovery of the graph. We demonstrate the usefulness of the proposed methodology in a real application to breast
We propose the use of a robust covariance estimator based on multivariate Winsorization in the co... more We propose the use of a robust covariance estimator based on multivariate Winsorization in the context of the Tarr-Müller-Weber framework for sparse estimation of the precision matrix of a Gaussian graphical model. Likewise Croux-Öllerer's precision matrix estimator, our proposed estimator attains the maximum finite sample breakdown point of 0.5 under cellwise contamination. We conduct an extensive Monte Carlo simulation study to assess the performance of ours and the currently existing proposals. We find that ours has a competitive behavior, regarding the the estimation of the precision matrix and the recovery of the graph. We demonstrate the usefulness of the proposed methodology in a real application to breast cancer data.
Journal of Data Science, Statistics, and Visualisation
We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit... more We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit the relation between the partial correlation coefficients and the distribution of the prediction errors, and parametrize the model in terms of the Pearson correlation coefficients between the prediction errors of the nodes’ best linear predictors. We propose a novel stepwise algorithm for detecting pairs of conditionally dependent variables. We compare the proposed algorithm with existing methods including graphical lasso (Glasso), constrained `l1-minimization(CLIME) and equivalent partial correlation (EPC), via simulation studies and real life applications. In our simulation study we consider several model settings and report the results using different performance measures that look at desirable features of the recovered graph.
In this work we study the asymptotic behavior of a robust class of estimators of the coefficient ... more In this work we study the asymptotic behavior of a robust class of estimators of the coefficient of a AR-2D process. We establish the precise conditions for the consistency and asymptotic normality of the RA estimator. The AR-2D model has many applications in image modeling and statistical image processing, therefore the relevance of knowing such properties. The adequacy of the
Journal of Statistical Planning and Inference, 2008
When the data used to fit a nonparametric regression model are contaminated with outliers, we nee... more When the data used to fit a nonparametric regression model are contaminated with outliers, we need to use a robust estimator of scale in order to make robust estimation of the regression function possible. We develop a family of M-estimators of scale constructed from consecutive differences of regression responses. Estimators in our family robustify the estimator proposed by Rice [1984. Bandwidth choice for nonparametric regression. Ann. Statist. 12, 1215–1230]. Under appropriate conditions, we establish the weak consistency and asymptotic normality of all estimators in our family. Estimators in our family vary in terms of their robustness properties. We quantify the robustness of each estimator via the maxbias. We use this measure as a basis for deriving the asymptotic breakdown point of the estimator. Our theoretical results allow us to specify conditions for estimators in our family to achieve a maximum asymptotic breakdown point of 12. We conduct a simulation study to compare the finite sample performance of our preferred M-estimator with that of three other estimators.
When the data used to fit an heteroscedastic nonparametric regression model are contaminated with... more When the data used to fit an heteroscedastic nonparametric regression model are contaminated with outliers, robust estimators of the scale function are needed in order to obtain robust estimators of the regression function and to construct robust confidence bands. In this paper, local M-estimators of the scale function based on consecutive differences of the responses, for fixed designs are considered.
Uploads
Papers by Marcelo Ruiz