Estimation and regularization techniques for regression models with multidimensional prediction functions

Schmid, Matthias; Potapov, Sergej; Pfahlberg, Annette; Hothorn, Torsten

doi:10.1007/s11222-009-9162-7

Estimation and regularization techniques for regression models with multidimensional prediction functions

Published: 09 December 2009

Volume 20, pages 139–150, (2010)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Matthias Schmid¹,
Sergej Potapov¹,
Annette Pfahlberg¹ &
…
Torsten Hothorn²

363 Accesses
Explore all metrics

Abstract

Boosting is one of the most important methods for fitting regression models and building prediction rules. A notable feature of boosting is that the technique can be modified such that it includes a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Article 13 January 2016

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Article 04 February 2023

Accelerated gradient boosting

Article 04 February 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Breiman, L.: Arcing classifiers (with discussion). Ann. Stat. 26, 801–849 (1998)
Article MATH MathSciNet Google Scholar
Breiman, L.: Prediction games and arcing algorithms. Neural Comput. 11, 1493–1517 (1999)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Brier, G.: Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950)
Article Google Scholar
Bühlmann, P.: Boosting for high-dimensional linear models. Ann. Stat. 34, 559–583 (2006)
Article MATH Google Scholar
Bühlmann, P., Hothorn, T.: Boosting algorithms: Regularization prediction and model fitting (with discussion). Stat. Sci. 22, 477–522 (2007)
Article Google Scholar
Bühlmann, P., Yu, B.: Boosting with the L ₂ loss: Regression and classification. J. Am. Stat. Assoc. 98, 324–338 (2003)
Article MATH Google Scholar
Consul, P., Jain, G.: A generalization of the Poisson distribution. Technometrics 15, 791–799 (1973)
Article MATH MathSciNet Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)
Article MATH MathSciNet Google Scholar
Efron, B., Johnston, I., Hastie, T., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Article MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Fitzpatrick, T.B.: The validity and practicality of sun-reactive skin types I through VI. Arch. Dermatol. 124, 869–871 (1988)
Article Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Article MATH MathSciNet Google Scholar
Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MATH Google Scholar
Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting (with discussion). Ann. Stat. 28, 337–407 (2000)
Article MATH MathSciNet Google Scholar
Gallagher, R.P., McLean, D.I., Yang, C.P., Coldman, A.J., Silver, H.K., Spinelli, J.J., Beagrie, M.: Suntan, sunburn, and pigmentation factors and the frequency of acquired melanocytic nevi in children. Similarities to melanoma: The Vancouver mole study. Arch. Dermatol. 126, 770–776 (1990)
Article Google Scholar
Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)
Article MATH MathSciNet Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfeld, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman & Hall, London (1990)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)
MATH Google Scholar
Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, Cambridge (2007)
MATH Google Scholar
Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: mboost: Model-Based Boosting (2008). R package version 1.0-4. http://R-forge.R-project.org
Li, L.: Multiclass boosting with repartitioning. In: Proceedings of the 23rd International Conference on Machine Learning (ICML2006), pp. 569–576. ACM Press, New York (2006)
Chapter Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, London (1989)
MATH Google Scholar
Mullahy, J.: Specification and testing of some modified count data models. J. Econom. 33, 341–365 (1986)
Article MathSciNet Google Scholar
Park, M.Y., Hastie, T.: L ₁-regularization path algorithm for generalized linear models. J. R. Stat. Soc., Ser. B 69, 659–677 (2007)
Article MathSciNet Google Scholar
Pfahlberg, A., Uter, W., Kraus, C., Wienecke, W.R., Reulbach, U., Kölmel, K.F., Gefeller, O.: Monitoring of nevus density in children as a method to detect shifts in melanoma risk in the population. Prev. Med. 38, 382–387 (2004)
Article Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0
Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)
Article MATH Google Scholar
Schmid, M., Hothorn, T.: Boosting additive models using component-wise P-splines. Comput. Stat. Data Anal. 53, 298–311 (2008a)
Article Google Scholar
Schmid, M., Hothorn, T.: Flexible boosting of accelerated failure time models. BMC Bioinform. 9:269 (2008b)
Segal, M.R.: Microarraygene expression data with linked survival phenotypes: Diffuse large-B-cell lymphoma revisited. Biostatistics 7, 268–285 (2006)
Article MATH Google Scholar
Sun, Y., Todorovic, S., Li, J.: Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework. Pattern Recognit. Lett. 28, 631–643 (2007)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. Ser. B 67, 91–108 (2005)
Article MATH MathSciNet Google Scholar
Uter, W., Pfahlberg, A., Kalina, B., Kölmel, K.F., Gefeller, O.: Inter-relation between variables determining constitutional UV sensitivity in Caucasian children. Photodermatol. Photoimmunol. Photomed. 20, 9–13 (2004)
Article Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Article MATH MathSciNet Google Scholar
Zhu, J., Rosset, S., Zou, H., Hastie, T.: A multi-class AdaBoost. Technical Report 430, Department of Statistics, University of Michigan (2005)
Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 6, 91054, Erlangen, Germany
Matthias Schmid, Sergej Potapov & Annette Pfahlberg
Institut für Statistik, Ludwig-Maximilians-Universität München, Ludwigstraße 33, 80539, München, Germany
Torsten Hothorn

Authors

Matthias Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Sergej Potapov
View author publications
You can also search for this author in PubMed Google Scholar
Annette Pfahlberg
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Hothorn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmid, M., Potapov, S., Pfahlberg, A. et al. Estimation and regularization techniques for regression models with multidimensional prediction functions. Stat Comput 20, 139–150 (2010). https://doi.org/10.1007/s11222-009-9162-7

Download citation

Received: 15 December 2008
Accepted: 23 November 2009
Published: 09 December 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s11222-009-9162-7

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation and regularization techniques for regression models with multidimensional prediction functions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Accelerated gradient boosting

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Estimation and regularization techniques for regression models with multidimensional prediction functions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Accelerated gradient boosting

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation