Regression Modeling Strategies: Frank E. Harrell, JR
Regression Modeling Strategies: Frank E. Harrell, JR
Regression Modeling
Strategies
With Applications to
Linear Models,
Logistic Regression,
and Survival Analysis
Springer
Contents
Preface
vii
Typographical Conventions
1 Introduction
1.1
1.2
1.3
1.4
1.5
xxiii
1
1
3
4
6
6
8
11
11
12
13
14
14
xiv
Contents
2.3.3
2.4
15
16
2.4.1
16
2.4.2
Splines for Estimating Shape of Regression Function and Determining Predictor Transformations
18
2.4.3
19
2.4.4
20
2.4.5
23
2.4.6
Nonparametric Regression
24
2.4.7
26
2.5
26
2.6
27
2.7
29
2.7.1
Regression Assumptions
29
2.7.2
32
2.7.3
34
2.7.4
Distributional Assumptions
35
2.8
Further Reading
36
2.9
Problems
37
Missing Data
41
3.1
41
3.2
Prelude to Modeling
42
3.3
43
3.4
43
3.5
44
3.6
47
3.7
Multiple Imputation
47
3.8
48
3.9
Further Reading
50
3.10
Problems
51
53
53
Contents
4.2
4.3
4.4
Variable Selection
Overfitting and Limits on Number of Predictors
56
60
4.5
Shrinkage
61
4.6
4.7
Collinearity
Data Reduction
64
66
4.8
4.9
4.10
4.11
...
5.1
5.2
5.3
5.4
5.5
56
4.7.1
4.7.2
4.7.3
4.7.4
Variable Clustering
Transformation and Scaling Variables Without Using Y . . .
Simultaneous Transformation and Imputation
Simple Scoring of Variable Clusters
66
67
69
70
4.7.5
4.7.6
72
73
74
77
79
79
4.10.2
82
xv
The Bootstrap
83
84
87
87
Model Validation
5.2.1
Introduction
5.2.2
Which Quantities Should Be Used in Validation?
90
90
91
5.2.3
5.2.4
5.2.5
91
93
94
Data-Splitting
Improvements on Data-Splitting: Resampling
Validation Using the Bootstrap
S-Plus Software
97
98
98
99
101
105
xvi
Contents
6.1
106
6.2
User-Contributed Functions
107
6.3
108
6.4
Other Functions
119
6.5
Further Reading
120
121
7.1
Descriptive Statistics
122
7.2
7.3
128
7.4
131
7.5
135
7.6
135
7.7
136
7.8
137
7.9
Problems
142
147
8.1
Data
147
8.2
150
8.3
Variable Clustering
151
8.4
8.5
157
8.6
160
8.7
168
8.8
169
8.9
170
8.10
Multiple Imputation
172
8.11
Further Reading
175
8.12
Problems
176
. . . 154
179
9.1
179
9.2
Hypothesis Tests
183
Contents
9.3
xvii
9.2.1
183
9.2.2
Wald Test
184
9.2.3
Score Test
184
9.2.4
185
General Case
186
9.3.1
187
9.3.2
187
9.3.3
189
9.3.4
190
9.4
Iterative ML Estimation
192
9.5
193
9.6
194
9.7
195
9.8
202
9.9
9.8.1
202
9.8.2
9.8.3
203
9.8.4
205
....
203
206
9.10
207
9.11
Further Reading
210
9.12
Problems
212
10.2
Model
215
215
10.1.1
217
10.1.2
220
10.1.3
Detailed Example
221
10.1.4
Design Formulations
227
Estimation
228
10.2.1
228
10.2.2
228
10.3
Test Statistics
229
10.4
Residuals
230
xviii
Contents
10.5
10.6
10.7
10.8
10.9
S-PLUS Functions
230
244
245
247
249
253
257
264
265
269
269
11.4
11.5
276
285
291
11.6
294
11.3
271
299
12.1
Descriptive Statistics
12.2
12.3
12.4
12.5
12.6
12.7
12.8
Background
Ordinality Assumption
Proportional Odds Model
13.3.1 Model
13.3.2 Assumptions and Interpretation of Parameters
300
331
331
332
333
333
333
Contents
xix
13.3.3
Estimation
334
13.3.4
Residuals
334
13.3.5
335
13.3.6
335
13.3.7
337
13.3.8
S-PLUS Functions
337
13.4
13.5
13.6
338
338
338
339
13.4.4
Residuals
339
13.4.5
339
13.4.6
Extended CR Model
339
13.4.7
13.4.8
340
341
13.4.9
S-PLUS Functions
341
Further Reading
Problems
342
342
Response Variable
Variable Clustering
14.3
14.4
14.5
14.6
14.7
14.8
14.9
14.10
14.11
14.12
14.13
. 346
347
349
351
352
355
357
357
359
364
367
369
371
xx
Contents
14.14 Problems
371
375
375
376
376
377
378
379
413
413
413
414
416
417
417
Contents
17.3
17.4
xxi
17.2.2
418
17.2.3
419
17.2.4
Specific Models
421
17.2.5
Estimation
422
17.2.6
423
426
17.3.1
Model
426
17.3.2
427
17.3.3
Specific Models
427
17.3.4
Estimation
428
17.3.5
Residuals
429
17.3.6
430
17.3.7
434
435
17.5
Design Formulations
435
17.6
Test Statistics
435
17.7
436
17.8
S-PLUS Functions
436
17.9
Further Reading
17.10 Problems
441
441
18.3
18.4
18.5
458
18.6
Problems
464
Model
454
454
465
465
19.1.1
Preliminaries
465
19.1.2
Model Definition
466
19.1.3
Estimation of/?
466
xxii
Contents
19.1.4 Model Assumptions and Interpretation of Parameters . . . .
19.1.5 Example
19.1.6 Design Formulations
19.1.7 Extending the Model by Stratification
19.2 Estimation of Survival Probability and Secondary Parameters . . .
468
468
470
470
472
19.3
474
Test Statistics
19.4
19.5
Residuals
476
Assessment of Model Fit
476
19.5.1 Regression Assumptions
477
19.5.2 Proportional Hazards Assumption
483
19.6 What to Do When PH Fails
489
19.7 Collinearity
491
19.8 Overly Influential Observations
492
19.9 Quantifying Predictive Ability
492
19.10 Validating the Fitted Model
493
19.10.1 Validation of Model Calibration
493
19.10.2 Validation of Discrimination and Other Statistical Indexes . 494
19.11 Describing the Fitted Model
496
19.12
S-PLUS Functions
499
506
513
516
517
517
519
522
Appendix
523
References
527
Index
559