Classical And. Modern Regression With Applications: Duxbury
Classical And. Modern Regression With Applications: Duxbury
Classical And. Modern Regression With Applications: Duxbury
Raymond H. Myers
Virginia Polytechnic Institute and State university
4
o
Thomson Learning,,
Duxbury
CONTENTS
CHAPTER
I
1
3 5 6 7
CONTENTS
vii
123 133 135 153 163
Multicollinearity in multiple regression data Quality fit, quality prediction, and the HAT matrix Categorical or indicator variables (Regression models and ANOVA models) Exercises References
CHAPTER
4
165 167 178 185 193 199 206
INFLUENCE DIAGNOSTICS
249
250 25 1 257 267 270 272 273
Sources of influence Diagnostics: Residuals and the HAT matrix Diagnostics that determine extent of influence Influence on performance What do we do with high influence points? Exercises References
CHAPTER
viii
CONTENTS
DETECTING AND COMBATING MULTICOLLINEARITY Multicollinearity diagnostics Variance proportions Further topics concerning multicollinearity Alternatives to least squares in cases of multicollinearity Exercises References
CHAPTER
368
369 371 379 389 419 422
NONLINEAR REGRESSION 424 Nonlinear least squares Properties of the least squares estimators The Gauss-Newton procedure for finding estimates Other modifications of the Gauss-Newton procedure Some special classes of nonlinear models Further considerations in nonlinear regression Why not transform data to linearize? Exercises References
SOME SPECIAL CONCEPTS I N MATRIX ALGEBRA solutions to simultaneous linear equations Quadratic form Eigenvalues and eigenvectors The inverses of a partitioned matrix Sherman-Morrison-Woodbury theorem References
APPENDIX
452
B
46 1 462 464 465 467 468 468 470 470 47 1
SOME SPECIAL MANIPULATIONS 461 Unbiasedness of the residual mean square Expected value of residual sum of squares and mean square for an unaerspecified model The maximum likelihood estimator Development of the PRESS statistic Comput?tion of s - , Dominance of a residual by the corresponding model error Computation of influence diagnostics Maximum likelihood estimator in the nonlinear model Taylor series Development of the C,-statistic
References
A~PENDIX
473
INDEX
Absolute errors, minimizing sum of, 351 Alias matrix, 178 All possible regressions, 193 Analysis of covariance, 150, 151 Analysis of variance general ANOVA; 27, 99 for straight line, 27 total SS breakup, 22, 95 Assumptions for regression model, 9, 11, 19, 21, 82, 91, 275 Autocorrelation, 287 Backward elimination, 186 Best subsets regression, 185, 193 Bias in regression estimates, 178 Biased estimation, 368, 392 principal components, 41 1 ridge regression, 392 Bonferroni confidence intervals, 50, 121 Box-Cox transformation, 310 Bonferroni inequality, 51, 52 Box, G.E.P. and Cox, 310 and Hunter, J.S. and Hunter, W.G., 375 and Tidwell, 307
Categorical variables, 135 one-way ANOVA, 150 Centering data, 11, 369 and scaling, 384 C , statistic, 400, 471 Coefficient of determination, 37, 39, 95 Collinearity, diagnosis of, 125 Condition number, 370 Confidence interval for E(Y),41, 112 intercept, 3 /, 32 slope, 32 Confidence region on regression line, 50 Correlation between x and y, 67 matrix, 76 and regression, 66 Covariance of j, b,, 15 C , statistic, 182 . Cox, D.R. and Box, 310 Degrees of freedom, 18 Deviance, 323, 347 Distribution between x and y, 67 bivariate normal, 67
Dummy variables. See categorical variables Durbin-Watson statistic, 288 Eigenvalues, 126 Eigenvectors, 126 Error structure additive, 297 multiplicative, 297 Error sum of squares, 19, 89 Errors in regressors, 357 Estimation linear least squares, 12, 88 maximum likelihood, 20, 92, 470 nonlinear, 424, 425 Gauss-Newton, 426 initial estimates, 427 Marquardt's compromise, 433 Estimator best linear unbiased, 92 bias, 60, 178 minimum variance unbiased, 92 Examination of regression equation R2, 37, 39, 95 residual examination, 57, 21 1 Expected value of MS, 27, 89, 118 Exponential error family, 340
INDEX
specific cases, 440 Interactions, 145 Iteratively reweighted least squares, 351 Joint confidence region on slope and intercept, 48 Lack of fit, 118 Least squares assumptions, 9, 11, 82, 91 Galton, F., 1 generalized, 278 iteratively reweighted, 351-352 maximum likelihood, 20 for nonlinear, 425, 426 properties, 14, 91, 92 Leverage, 252 Likelihood function, 20, 470 Likelihood ratio test, 322 Link function, 340-344 Logistic growth model, 437 Logistic regression, 3 17
, 182 Mallows' C Marquardt, D.W., 433 procedure, 433 Minimum variance unbiased estimator, 92 Mitcherlich's law, 439 Model Gompertz, 437 logistic, 437 mean shift outlier, 222 Michaelis-Menten, 435 Mitcherlich, 439 Richards, 439 Weibull, 439 Multicollinearity, 125, 368 Multiple correlation coefficient. See R2 Multiple regression, 82 Multivariate normal density, 470
R-Student, 223 studentized residuals, 2 17 Overfitting, 179 Partial F-test definition, 102, 103 in selection procedures, 186 P!ots augmented partial plots, 204 D F trace, 401 partial regression, 233 partial residual plots, 238 residual, 57, 21 1 ridge trace, 396 Poisson regression, 332 Polynomial models, 83 Power transformations, 3 10 Prediction interval, 41, 112 Prediction-oriented model criteria, 167-172 PRESS residuals, 171 statistic, 171 sum of absolute PRESS residuals,
177
Principal components regression, 41 1 Probability tables F, 477-480 normal, 475 t, 476 Proportions as responses, 3 15-323 . PR (Rldge), 397 Pure error, 118 repeat runs, 118 ~uadratic form, 454 R2, 37, 39, 95 Regression analysis assumptions, 9, 11, 19, 21, 82, 91, 275 multiple, 82 purpose of, 2, 3, 4 straight line, 9 Regression equation, examination of residual analyses, 57 Regression through origin, 33 estimate of slope, 33 Repeat runs, 118 Residual MS, 19, 89 Residual plots, 57, 21 1 Residuals Cook's D statistic, 259, 260 examination of, 57, 21 1 outliers, 22 1 PRESS, 171 studentlzed, 63, 221, 222 Response variable, transformation of, 310
Nonlinear estimation examples, 430, 434 growth models, 437 least squares, 425 starting values for, 440 Nonlinear growth models, 438, 439 types. See model Normal distribution tables of, 475 Normal equations matrix solution, 88 multiple regression, 88 straight line, 12 Normal probability plots, 60 Outliers, 221 diagnosis of, 221
INDEX
Richards, F.J., 437 Richards growth model, 437 Ridge regression, 392 choice of 1, 396-412 ridge trace, 396 Robust regression, 349-357 M-estimators, 349-357 R-Student statistic, 222-224 s2, 19, 89 SAS programs MAXR, 190 NLIN, 435, PROC REG, 234 Stepwise, 186 Scaling and centering, 384 Selection procedures all possible regressions, 193 backward elimination, 186 best subsets regression, 185, 193 forward selection, 185 , 182 Mallows' C PRESS, 171 Sequential F-test in selection procedures, 185 Sherman-Morrison-Woodbury theorem, 459 Simultaneous inference, 47 Snee, R.D., 169 Squared multiple correlation. See R2 Stabilizing variance, transformations for, 286 Stagewise regression, 185 Standard error of b,, 15
of b,, 15 of prediction, 42, 112 of j, 42, 112 Stepwise regression, 186 Straight line regression ANOVA, 27 assumptions, 9, 11, 19, 21 normal equations, 12 Studentized residuals, 63, 221, 222 Sum of squares (SS) breakup of total SS, 22, 95 for lack of fit, 118 .for pure error, 118 for regression, 22, 95 for residual, 19 Tables Chi square, 484 Durbin-Watson test, 485 F-test, 477-480 normal distribution, 475 outlier test, 48 1-482 rankits, 483 t-test, 476 Taylor series expansion, 309, 427, 470 Tidwell. P.W. and BOX,307-309 Transformations hazards of, 297-299 to linearize, 444 logarithmic, 295 power, 307, 310 reciprocal, 296 on regressors Box and Tidwell, 307
on response, 310 to stabilize variance, 286 t-table, 476 t-test for intercept, 30 for regression coefficient, 98 for slope, 30 Underspecification, 178 Validation of model, 167 data splitting, 169 PRESS, 170 Variance of intercept, 15 of predicted value, 43, 112 of slope, 15 Variance-covariance matrix, 91 Variance decomposition proportions, 371-372 Variance inflation factor (VIP), 127, 369 Variance stabilizing transformation, 286 Weibull growth model, 439 Weighted least squares, 279-281 example, 281 Working-Hotelling confidence interval, 49 x-variables centered and scaled, 369, 384 transformations of, 307