Abstract
In this chapter, we present statistical modelling approaches for predictive tasks in business and science. Most prominent is the ubiquitous multiple linear regression approach where coefficients are estimated using the ordinary least squares algorithm. There are many derivations and generalizations of that technique. In the form of logistic regression, it has been adapted to cope with binary classification problems. Various statistical survival models allow for modelling of time-to-event data. We will detail the many benefits and a few pitfalls of these techniques based on real-world examples. A primary focus will be on pointing out the added value that these statistical modelling tools yield over more black box-type machine-learning algorithms. In our opinion, the added value predominantly stems from the often much easier interpretation of the model, the availability of tools that pin down the influence of the predictor variables in concise form, and finally from the options they provide for variable selection and residual analysis, allowing for user-friendly model development, refinement, and improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allison, P. D. (2010). Survival analysis using SAS: A practical guide (2nd ed.). Cary, NC: SAS Institute.
Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society B, 20, 215–242.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B., 34(2), 187–220.
Diggle, P. J., & Chetwynd, A. G. (2011). Statistics and scientific method: An introduction for students and researcher. New York: Oxford University Press.
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (Springer series in statistics). New York: Springer.
Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis (Springer series in statistics). Heidelberg: Springer.
Hastie, T., & Tibshirani, R. (1990). Generalized additive models. London: Chapman and Hall.
Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (Wiley series in probability and statistics) (2nd ed.). Hoboken, NJ: Wiley.
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.
Leitgöb, H. (2013). The problem of modelling rare events in ML-based logistic regression – Assessing potential remedies via MC simulations. Conference Paper at European Survey Research Association, Ljubliana.
McCullagh, P., & Nelder, J. (1989). Generalized linear models (Monographs on statistics & applied probability) (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.
Montgomery, D., Peck, E., & Vining, G. (2006). Introduction to linear regression analysis. New York: Wiley Interscience.
Plackett, R. L. (1972). The discovery of the method of least squares. Biometrika, 59(2), 239–251.
Sen, A., & Srivastava, M. (1990). Regression analysis: Theory, methods, and applications. New York: Springer.
Stigler, S. M. (1981). Gauss and the invention of least squares. Annals of Statistics, 9(3), 465–474.
Tufféry, S. (2011). Data mining and statistics for decision making. Chichester: Wiley.
Wood, S. (2006). Generalized additive models: An introduction with R (Texts in statistical science). Boca Raton, FL: Chapman & Hall/CRC.
Acknowledgments
The authors thank the editors for their constructive comments, which have led to significant improvements of this article.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dettling, M., Ruckstuhl, A. (2019). Statistical Modelling. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-11821-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11820-4
Online ISBN: 978-3-030-11821-1
eBook Packages: Computer ScienceComputer Science (R0)