Abstract
We propose a semiparametric framework based on sliced inverse regression (SIR) to address the issue of variable selection in functional regression. SIR is an effective method for dimension reduction which computes a linear projection of the predictors in a low-dimensional space, without loss of information on the regression. In order to deal with the high dimensionality of the predictors, we consider penalized versions of SIR: ridge and sparse. We extend the approaches of variable selection developed for multidimensional SIR to select intervals that form a partition of the definition domain of the functional predictors. Selecting entire intervals rather than separated evaluation points improves the interpretability of the estimated coefficients in the functional framework. A fully automated iterative procedure is proposed to find the critical (interpretable) intervals. The approach is proved efficient on simulated and real data. The method is implemented in the R package SISIR available on CRAN at https://cran.r-project.org/package=SISIR.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allen, R.G., Pereira, L.S., Raes, D., Smith, M.: Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. FAO, Rome 300(9), D05109 (1998)
Aneiros, G., Vieu, P.: Variable in infinite-dimensional problems. Stat. Probab. Lett. 94, 12–20 (2014)
Bernard-Michel, C., Gardes, L., Girard, S.: A note on sliced inverse regression with regularizations. Biometrics 64(3), 982–986 (2008). https://doi.org/10.1111/j.1541-0420.2008.01080.x
Bettonvil, B.: Factor screening by sequential bifurcation. Commun. Stat. Simul. Comput. 24(1), 165–185 (1995)
Biau, G., Bunea, F., Wegkamp, M.: Functional classification in Hilbert spaces. IEEE Trans. Inf. Theory 51, 2163–2172 (2005)
Borggaard, C., Thodberg, H.: Optimal minimal neural interpretation of spectra. Anal. Chem. 64(5), 545–551 (1992)
Bura, A., Cook, R.: Extending sliced inverse regression: the weighted chi-squared test. J. Am. Stat. Assoc. 96(455), 996–1003 (2001)
Bura, E., Yang, J.: Dimension estimation in sufficient dimension reduction: a unifying approach. J. Multivar. Anal. 102(1), 130–142 (2011). https://doi.org/10.1016/j.jmva.2010.08.007
Casadebaig, P., Guilioni, L., Lecoeur, J., Christophe, A., Champolivier, L., Debaeke, P.: Sunflo, a model to simulate genotype-specific performance of the sunflower crop in contrasting environments. Agric. For. Meteorol. 151(2), 163–178 (2011)
Chen, C., Li, K.: Can SIR be as popular as multiple linear regression? Stat. Sin. 8, 289–316 (1998)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis puirsuit. SIAM J. Sci. Comput. 20(1), 33–61 (2015)
Cook, R.: Testing predictor contributions in sufficient dimension reduction. Ann. Stat. 32(3), 1061–1092 (2004)
Cook, R., Yin, X.: Dimension reduction and visualization in discriminant analysis. Aust. N. Z. J. Stat. 43(2), 147–199 (2001)
Coudret, R., Liquet, B., Saracco, J.: Comparison of sliced inverse regression aproaches for undetermined cases. J. Soc. Fr. Stat. 155(2), 72–96 (2014). http://journal-sfds.fr/index.php/J-SFdS/article/view/278
Dauxois, J., Ferré, L., Yao, A.: Un modèle semi-paramétrique pour variable aléatoire hilbertienne. Comptes Rendus Math. Acad. Sci. Paris 327(I), 947–952 (2001). https://doi.org/10.1016/S0764-4442(01)02163-2
Fauvel, M., Deschene, C., Zullo, A., Ferraty, F.: Fast forward feature selection of hyperspectral images for classification with Gaussian mixture models. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(6), 2824–2831 (2015). https://doi.org/10.1109/JSTARS.2015.2441771
Ferraty, F., Hall, P.: An algorithm for nonlinear, nonparametric model choice and prediction. J. Comput. Graph. Stat. 24(3), 695–714 (2015). https://doi.org/10.1080/10618600.2014.936605
Ferraty, F., Hall, P., Vieu, P.: Most-predictive design points for functional data predictors. Biometrika 97(4), 807–824 (2010). https://doi.org/10.1093/biomet/asq058
Ferré, L.: Determining the dimension in sliced inverse regression and related methods. J. Am. Stat. Assoc. 93(441), 132–140 (1998). https://doi.org/10.1080/01621459.1998.10474095
Ferré, L., Villa, N.: Multi-layer perceptron with functional inputs: an inverse regression approach. Scand. J. Stat. 33(4), 807–823 (2006). https://doi.org/10.1111/j.1467-9469.2006.00496.x
Ferré, L., Yao, A.: Functional sliced inverse regression analysis. Statistics 37(6), 475–488 (2003)
Fraiman, R., Gimenez, Y., Svarc, M.: Feature selection for functional data. J. Multivar. Anal. 146, 191–208 (2016). https://doi.org/10.1016/j.jmva.2015.09.006
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Fromont, M., Tuleau, C.: Functional classification with margin conditions. In: Lugosi, G., Simon, H. (eds.) Proceedings of the 19th Annual Conference on Learning Theory (COLT 2006), Springer (Berlin/Heidelberg), Pittsburgh, PA, USA, Lecture Notes in Computer Science, vol. 4005, pp. 94–108 (2006). https://doi.org/10.1007/11776420_10
Fruth, J., Roustant, O., Kuhnt, S.: Sequential designs for sensitivity analysis of functional inputs in computer experiments. Reliab. Eng. Syst. Saf. 134, 260–267 (2015)
Golub, T., Slonim, D., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979). https://doi.org/10.2307/1268518
Grollemund, P., Abraham, C., Baragatti, M., Pudlo, P.: Bayesian functional linear regression with sparse step functions. Preprint (2018). arXiv:1604.08403
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, New York (2001)
Hernández, N., Biscay, R., Villa-Vialaneix, N., Talavera, I.: A non parametric approach for calibration with functional data. Stat. Sin. 25, 1547–1566 (2015). https://doi.org/10.5705/ss.2013.242
James, G., Wang, J., Zhu, J.: Functional linear regression that’s interpretable. Ann. Stat. 37(5A), 2083–2108 (2009). https://doi.org/10.1214/08-AOS641
Kneip, A., Poß, D., Sarda, P.: Functional linear regression with points of impact. Ann. Stat. 44(1), 1–30 (2016). https://doi.org/10.1214/15-AOS1323
Li, K.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86(414), 316–342 (1991). http://www.jstor.org/stable/2290563
Li, L., Nachtsheim, C.: Sparse sliced inverse regression. Technometrics 48(4), 503–510 (2008)
Li, L., Yin, X.: Sliced inverse regression with regularizations. Biometrics 64(1), 124–131 (2008). https://doi.org/10.1111/j.1541-0420.2007.00836.x
Lin, Q., Zhao, Z., Liu, J.: On consistency and sparsity for sliced inverse regression in high dimensions. Preprint (2018). arXiv:1507.03895
Liquet, B., Saracco, J.: A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches. Comput. Stat. 27(1), 103–125 (2012)
Matsui, H., Konishi, S.: Variable selection for functional regression models via the \(l_1\) regularization. Comput. Stat. Data Anal. 55(12), 3304–3310 (2011). https://doi.org/10.1016/j.csda.2011.06.016
McKeague, I., Sen, B.: Fractals with point impact in functional linear regression. Ann. Stat. 38(4), 2559–2586 (2010). https://doi.org/10.1214/10-AOS791
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.6-7 (2015)
Ni, L., Cook, D., Tsai, C.: A note on shrinkage sliced inverse regression. Biometrika 92(1), 242–247 (2005)
Park, A., Aston, J., Ferraty, F.: Stable and predictive functional domain selection with application to brain images. Preprint (2016). arXiv:1606.02186
Portier, F., Delyon, B.: Bootstrap testing of the rank of a matrix via least-square constrained estimation. J. Am. Stat. Assoc. 109(505), 160–172 (2014). https://doi.org/10.1080/01621459.2013.847841
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)
Schott, J.: Determining the dimensionality in sliced inverse regression. J. Am. Stat. Assoc. 89(425), 141–148 (1994)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013). https://doi.org/10.1080/10618600.2012.681250
Tibshirani, R., Saunders, G., Rosset, S., Zhu, J., Knight, J.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B 67(1), 91–108 (2005)
Zhao, Y., Ogden, R., Reiss, P.: Wavelet-based LASSO in functional linear regression. J. Comput. Graph. Stat. 21(3), 600–617 (2012). https://doi.org/10.1080/10618600.2012.679241
Zhu, L., Miao, B., Peng, H.: On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 101(474), 360–643 (2006)
Acknowledgements
The authors thank the two anonymous referees for relevant remarks and constructive comments on a previous version of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Equivalent expressions for \(R^2(d)\)
In this section, we show that \(R^2(d) = \frac{1}{2}\mathbb {E} \left\| \varPi _d - \widehat{\varPi }_d \right\| ^2_F\). We have
The norm of a M-orthogonal projector onto a space of dimension d is equal to d, we thus have that
which concludes the proof.
Joint choice of the parameters \(\mu _2\) and d
Notations:
-
\(\mathcal {L}_l\) are observations in fold number l and \(\overline{\mathcal {L}_l}\) are the remaining observations;
-
\(\hat{A}(\mathcal {L}, \mu _2, d)\) and \(\hat{C}(\mathcal {L}, \mu _2, d)\) are minimizers of the ridge regression problem restricted to observations \(i \in \mathcal {L}\). Note that for \(d_1 < d_2\), \(\hat{A}(\tau , \mu _2, d_1)\) are the first \(d_1\) columns of \(\hat{A}(\mathcal {L}, \mu _2, d_2)\) (and similarly for \(\hat{C}(\mathcal {L}, \mu _2, d)\));
-
\(\hat{p}_h^\mathcal {L}\), \(\overline{X}_h^\mathcal {L}\), \(\overline{X}^\mathcal {L}\) and \(\widehat{\varSigma }^\mathcal {L}\) are, respectively, slices frequencies, conditional mean of X given the slices, mean of X given the slices and covariance of X for observations \(i \in \mathcal {L}\);
-
\(\widehat{\varPi }_{d,\mu _2}^{\mathcal {L}}\) is the \((\widehat{\varSigma }^{\mathcal {L}}+\mu _2\mathbb {I}_p)\)-orthogonal projector onto the space spanned by the first d columns of \(\hat{A}(\mathcal {L},\mu _2,d_0)\) and \(\widehat{\varPi }_{d,\mu _2}\) is \(\widehat{\varPi }_{d,\mu _2}^{\mathcal {L}}\) for \(\mathcal {L} = \{1,\,\ldots ,\,n\}\).
Rights and permissions
About this article
Cite this article
Picheny, V., Servien, R. & Villa-Vialaneix, N. Interpretable sparse SIR for functional data. Stat Comput 29, 255–267 (2019). https://doi.org/10.1007/s11222-018-9806-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-9806-6