Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The concordance filter: an adaptive model-free feature screening procedure

Published: 05 September 2023 Publication History

Abstract

A new model-free and data-adaptive feature screening procedure referred to as the concordance filter is developed for ultrahigh-dimensional data. The proposed method is based on the concordance filter which measures concordance between random vectors and can work adaptively with several types of predictors and response variables. We apply the concordance filter to deal with feature screening problems emerging from a wide range of real applications, such as nonparametric regression and survival analysis, among others. It is shown that the concordance filter enjoys the sure screening and rank consistency properties under weak regularity conditions. In particular, the concordance filter can still be powerful in the presence of censoring and heavy tails. We further demonstrate the superior performance of the concordance filter over existing screening methods by numerical examples and medical applications.

References

[1]
Barber RF and Candès EJ Controlling the false discovery rate via knockoffs Annals Stat 2015 43 5 2055-2085
[2]
Barber RF and Candès EJ A knockoff filter for high-dimensional selective inference Annals Stat 2019 47 5 2504-2537
[3]
Bing X and Wegkamp MH Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models Annals Stat 2019 47 6 3157-3184
[4]
Chen B, Qin J, and Yuan A Using the accelerated failure time model to analyze current status data with misclassified covariates Electron J Stat 2021 15 1 1372-1394
[5]
Clayton D and Cuzick J Multivariate generalizations of the proportional hazards model J R Stat Soc: Series A (General) 1985 148 2 82-108
[6]
Cox DR Regression models and life-tables J R Stat Soc: Series B (Methodological) 1972 34 2 187-202
[7]
Desmedt C, Piette F, Loi S, et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the transbig multicenter independent validation series Clin Cancer Res 2007 13 11 3207-3214
[8]
Fan J and Lv J Sure independence screening for ultrahigh dimensional feature space J R Stat Soc: Series B (Stat Methodol) 2008 70 5 849-911
[9]
Fan J and Song R Sure independence screening in generalized linear models with np-dimensionality Annals Stat 2010 38 6 3567-3604
[10]
Fan J, Samworth R, and Wu Y Ultrahigh dimensional feature selection: beyond the linear model J Mach Learn Res 2009 10 2013-2038
[11]
Fan J, Feng Y, and Song R Nonparametric independence screening in sparse ultra-high-dimensional additive models J Am Stat Assoc 2011 106 494 544-557
[12]
Fan J, Li R, Zhang CH, et al. Statistical foundations of data science 2020 Chapman and Hall/CRC
[13]
Hall P and Miller H Using generalized correlation to effect variable selection in very high dimensional problems J Comput Graphic Stat 2009 18 3 533-550
[14]
Hall P and Xue JH On selecting interacting features from high-dimensional data Comput Stat Data Anal 2014 71 694-708
[15]
Harrell FE, Califf RM, Pryor DB, et al. Evaluating the yield of medical tests JAMA 1982 247 18 2543-2546
[16]
Hoeffding W Probability inequalities for sums of bounded random variables J Am Stat Assoc 1963 58 301 13-30
[17]
Huang J, Horowitz JL, and Ma S Asymptotic properties of bridge estimators in sparse high-dimensional regression models Annals Stat 2008 36 2 587-613
[18]
Huang J, Breheny P, and Ma S A selective review of group selection in high-dimensional models Stat Sci 2012 27 4 481-499
[19]
Ishwaran H, Kogalur UB, Blackstone EH, et al. Random survival forests Ann Appl Stat 2008 2 3 841-860
[20]
Kalbfleisch JD and Prentice RL The statistical analysis of failure time data 2011 John Wiley & Sons
[21]
Kaplan EL and Meier P Nonparametric estimation from incomplete observations J Am Stat Assoc 1958 53 282 457-481
[22]
Kendall MG A new measure of rank correlation Biometrika 1938 30 1/2 81-93
[23]
Klein N, Kneib T, Lang S, et al. Bayesian structured additive distributional regression with an application to regional income inequality in Germany Ann Appl Stat 2015 9 2 1024-1052
[24]
Li G, Peng H, Zhang J, et al. Robust rank correlation based screening Ann Stat 2012 40 3 1846-1877
[25]
Li R, Zhong W, and Zhu L Feature screening via distance correlation learning J Am Stat Assoc 2012 107 499 1129-1139
[26]
Liu W, Ke Y, Liu J, et al. Model-free feature screening and FDR control with knockoff features J Am Stat Assoc 2022 117 537 428-443
[27]
Lovell MC Seasonal adjustment of economic time series and multiple regression analysis J Am Stat Assoc 1963 58 304 993-1010
[28]
Lv J and Liu JS Model selection principles in misspecified models J R Stat Soc Ser B Stat Methodol 2014 76 1 141-167
[29]
Pan W, Wang X, Xiao W, et al. A generic sure independence screening procedure J Am Stat Assoc 2019 114 526 928-937
[30]
Pan W, Wang X, Zhang H, et al. Ball covariance: a generic measure of dependence in Banach space J Am Stat Assoc 2020 115 529 307-317
[31]
Pukelsheim F The three sigma rule Am Stat 1994 48 2 88-91
[32]
Ritchie MD and Van Steen K The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation Annals of translational medicine 2018 6 8 157-157
[33]
Saldana DF and Feng Y Sis: an r package for sure independence screening in ultrahigh-dimensional statistical models J Stat Softw 2018 83 1 1-25
[34]
Sellke TM and Sellke SH Chebyshev inequalities for unimodal distributions Am Stat 1997 51 1 34-40
[35]
Sen PK Estimates of the regression coefficient based on Kendall’s tau J Am Stat Assoc 1968 63 324 1379-1389
[36]
Shen Y, Ning J, and Qin J Analyzing length-biased data with semiparametric transformation and accelerated failure time models J Am Stat Assoc 2009 104 487 1192-1202
[37]
Song R, Lu W, Ma S, et al. Censored rank independence screening for high-dimensional survival data Biometrika 2014 101 4 799-814
[38]
Stroud JR, Müller P, and Polson NG Nonlinear state-space models with state-dependent variances J Am Stat Assoc 2003 98 462 377-386
[39]
Tibshirani R Regression shrinkage and selection via the lasso J Roy Stat Soc: Ser B (Methodol) 1996 58 1 267-288
[40]
Vogelsang TJ (2001) Nonlinear econometric modeling in time series analysis, in: Proceedings of the eleventh international symposium in economic theory. 96(453):354–354
[41]
Zhao SD and Li Y Principled sure independence screening for cox models with ultra-high-dimensional covariates J Multivar Anal 2012 105 1 397-411
[42]
Zhao Z Parametric and nonparametric models and methods in financial econometrics Stat Surv 2008 2 1-42
[43]
Zhu J, Pan W, Zheng W, et al. Ball: an r package for detecting distribution difference and association in metric spaces J Stat Softw 2021 97 1-31
[44]
Zhu LP, Li L, Li R, et al. Model-free feature screening for ultrahigh-dimensional data J Am Stat Assoc 2011 106 496 1464-1475

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computational Statistics
Computational Statistics  Volume 39, Issue 5
Jul 2024
477 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 05 September 2023
Accepted: 07 August 2023
Received: 28 September 2022

Author Tags

  1. Concordance filter
  2. Sure independent screening
  3. High-dimensional data
  4. Model-free
  5. Data-adaptive

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media