Assignment
Assignment
2025-02-17
#Problem-3.1
The data below are remission times, in weeks, for a group of 30 patients with leukemia who received similar
treatment. Asterisks denote censoring times.
1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14, 18, 19, 24, 26, 29, 31*, 42, 45*, 50*, 57, 60, 71*, 85*, 91.
1. Obtain and plot the Kaplan-Meier estimate Ŝ(t) of the survivor function for remission time.
2. Obtain approximate .95 confidence intervals for the median remission time and for the probability that
remission lasts over 26 weeks.
3. Plot log(log Ŝ(t)) and log  (t) on the same graph, where Ĥ(t) is the Nelson-Aalen estimate. Is there
much difference?
(1) Obtain and plot the Kaplan-Meier estimate Ŝ(t) of the survivor function for
remission time.
## [1] 1 1 2 4 4 6 6 6 7 8 9 9 10 12 13 14 18 19 24 26 29 31 42 45 50
## [26] 57 60 71 85 91
1
censoring <- rep(1, length(remission_times))
censoring
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
censoring[c(22, 24, 25, 28, 29)] <- 0 # Censored at 31, 45, 50, 71, and 85 weeks
censoring
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 0 1
## [1] 1 1 2 4 4 6 6 6 7 8 9 9 10 12 13 14 18 19 24
## [20] 26 29 31+ 42 45+ 50+ 57 60 71+ 85+ 91
• Kaplan-Meier estimate
summary(km_fit)
2
• Plot the Kaplan-Meier estimate
1.00
Survival Probability
0.75
0.50
0.25
+ + +
+ +
0.00
0 25 50 75 100
Time (weeks)
Number at risk
Strata
All 30 11 6 2 0
0 25 50 75 100
Time (weeks)
(2) Obtain approximate .95 confidence intervals for the median remission time
and # for the probability that remission lasts over 26 weeks.
3
prob_26 <- summary(km_fit, times = 26)$surv
prob_26
## [1] 0.3333333
## [1] 0.2009553
• Print results
print(ci_median)
##
## Estimated probability of remission lasting over 26 weeks: 0.3333333
(c) Plot log(log Ŝ(t)) and log  (t) on the same graph, where Ĥ(t) is the Nelson-
Aalen estimate. Is there much difference?
• Nelson-Aalen estimate
4
na_df <- data.frame(time = na_fit$time, H_t = cumsum(na_fit$n.event / na_fit$n.risk))
na_df
## time H_t
## 1 1 0.06666667
## 2 2 0.10238095
## 3 4 0.17645503
## 4 6 0.29645503
## 5 7 0.34190957
## 6 8 0.38952862
## 7 9 0.48952862
## 8 10 0.54508418
## 9 12 0.60390770
## 10 13 0.66640770
## 11 14 0.73307437
## 12 18 0.80450294
## 13 19 0.88142602
## 14 24 0.96475935
## 15 26 1.05566844
## 16 29 1.15566844
## 17 31 1.15566844
## 18 42 1.28066844
## 19 45 1.28066844
## 20 50 1.28066844
## 21 57 1.48066844
## 22 60 1.73066844
## 23 71 1.73066844
## 24 85 1.73066844
## 25 91 2.73066844
## time S_t
## 1 1 0.9333333
## 2 2 0.9000000
## 3 4 0.8333333
## 4 6 0.7333333
## 5 7 0.7000000
## 6 8 0.6666667
## 7 9 0.6000000
## 8 10 0.5666667
## 9 12 0.5333333
## 10 13 0.5000000
## 11 14 0.4666667
## 12 18 0.4333333
## 13 19 0.4000000
## 14 24 0.3666667
## 15 26 0.3333333
## 16 29 0.3000000
5
## 17 31 0.3000000
## 18 42 0.2625000
## 19 45 0.2625000
## 20 50 0.2625000
## 21 57 0.2100000
## 22 60 0.1575000
## 23 71 0.1575000
## 24 85 0.1575000
## 25 91 0.0000000
km_df <- km_df[km_df$S_t > 0, ] # Remove zero survival values to avoid log(0)
km_df
## time S_t
## 1 1 0.9333333
## 2 2 0.9000000
## 3 4 0.8333333
## 4 6 0.7333333
## 5 7 0.7000000
## 6 8 0.6666667
## 7 9 0.6000000
## 8 10 0.5666667
## 9 12 0.5333333
## 10 13 0.5000000
## 11 14 0.4666667
## 12 18 0.4333333
## 13 19 0.4000000
## 14 24 0.3666667
## 15 26 0.3333333
## 16 29 0.3000000
## 17 31 0.3000000
## 18 42 0.2625000
## 19 45 0.2625000
## 20 50 0.2625000
## 21 57 0.2100000
## 22 60 0.1575000
## 23 71 0.1575000
## 24 85 0.1575000
ggplot() +
geom_line(data = km_df, aes(x = log(time), y = log(-log(S_t))), color = "blue", size = 1) +
geom_line(data = na_df, aes(x = log(time), y = log(H_t)), color = "red", size = 1) +
labs(title = "Comparison of Log-Transformed Survival and Hazard Functions",
x = "log(Time)", y = "log(-log(S(t))) / log(H(t))") +
theme_minimal()
## Warning: Using ‘size‘ aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use ‘linewidth‘ instead.
## This warning is displayed once every 8 hours.
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.
6
Comparison of Log−Transformed Survival and Hazard Functions
1
0
log(−log(S(t))) / log(H(t))
−1
−2
0 1 2 3 4
log(Time)
--------------------------------------------------------------------------------------------------------------------
#Problem-3.2
The data below show survival times (in months) of patients with Hodgkin’s dis- ease who were treated
with nitrogen mustards (Bartolucci and Dickey, 1977). Group A patients received little or no prior therapy,
whereas Group B patients received heavy prior therapy. Starred observations are censoring times.
Group A 1.25, 1.41, 4.98, 5.25, 5.38, 6.92, 8.89, 10.98, 11.18, 13.11, 13.21, 16.33, 19.77, 21.08, 21.84*, 22.07,
31.38*, 32,62*, 37.18*, 42.92. Group B 1,05, 2.92, 3.61, 4,20, 4.49, 6.72, 7.31, 9.08, 9.11, 14.49*, 16.85,
18.82*, 26.59*, 30.26*, 41.34*.
1. Obtain and compare Kaplan-Meier estimates for the two groups. Does there appear to be a difference
in the 1-year survival probability for the two types of patients? Give confidence limits for S(1) and for
the median survival time 1.50 for each group.
2. Use plots of the Nelson-Aalen estimate (t) to examine and compare the two life distributions.
3. Do any parametric models whereby one tions suggest themselves? might compare the two distribu-
7
(1) Obtain and compare Kaplan-Meier estimates for the two groups. Does there
appear to be a difference in the 1-year survival probability for the two types of
patients? Give confidence limits for S(1) and for the median survival time 1.50
for each group.
group_A <- c(1.25, 1.41, 4.98, 5.25, 5.38, 6.92, 8.89, 10.98, 11.18, 13.11,
13.21, 16.33, 19.77, 21.08, 21.84, 22.07, 31.38, 32.62, 37.18, 42.92)
censor_A <- rep(1, length(group_A))
censor_A
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
length(censor_A)
## [1] 20
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 1
group_B <- c(1.05, 2.92, 3.61, 4.20, 4.49, 6.72, 7.31, 9.08, 9.11, 14.49,16.85,18.82,26.59,30.26,41.34)
length(group_B)
## [1] 15
## [1] 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0
## [1] 1.25 1.41 4.98 5.25 5.38 6.92 8.89 10.98 11.18 13.11
## [11] 13.21 16.33 19.77 21.08 21.84+ 22.07 31.38+ 32.62+ 37.18+ 42.92
## [1] 1.05 2.92 3.61 4.20 4.49 6.72 7.31 9.08 9.11 14.49+
## [11] 16.85 18.82+ 26.59+ 30.26+ 41.34+
8
km_A <- survfit(surv_A ~ 1);km_A
summary(km_A)
summary(km_B)
9
• Plot Kaplan-Meier curves
ggsurvplot_combine(
list("Group A" = km_A, "Group B" = km_B),
conf.int = TRUE, risk.table = TRUE,
xlab = "Time (Months)", ylab = "Survival Probability",
title = "Kaplan-Meier Estimates for Group A and B",
legend.title = "Groups"
)
1.00
Survival Probability
0.75
0.50
+
+ + + + +
0.25 ++ +
0.00
0 10 20 30 40
Time (Months)
Number at risk
Groups
Group A::All 20 13 7 4 1
Group B::All 15 6 3 2 1
0 10 20 30 40
Time (Months)
• Compute 95% confidence intervals for S(1) and median survival time
10
(2) Use plots of the Nelson-Aalen estimate (t) to examine and compare the two
life distributions.
• Nelson-Aalen estimates for cumulative hazard
Nelson−Aalen Estimates
Group A
Group B
1.2
Cumulative Hazard
0.8
0.4
0.0
0 10 20 30 40
Time (Months)
11
--------------------------------------------------------------------------------------------------------------------
#Problem-3.5
Pike (1966) gave results of a laboratory experiment in which 19 female rats were painted with the carcinogen
DMBA. The number of days T until the appearance of a carcinoma was of interest, and the data gave the
following times (asterisks denote censoring limes):
143, 164, 188, 188, 190, 192, 206, 209, 213, 216, 220, 227, 230, 234, 246, 265,304, 216, 244*
1. It was thought that carcinomas could not appear before some threshold time > 0, so a Weibull model
for T’ Ty was considered. Give twa Weibull probability plots, using (1) the raw data (1-values), and
(2) the values 1-100. Is there any strong indication that T-100 is closer to Weibull-distributed than is
T?
2. Obtain a nonparametric .95 confidence interval for the median time to car- cinoma, 1.50. Comment
on the advantages and disadvantages of this esti- mate over one based on a Weibull model.
(1) It was thought that carcinomas could not appear before some threshold time
> 0, so a Weibull model for T’ Ty was considered. Give twa Weibull probability
plots, using (1) the raw data (1-values), and (2) the values 1-100. Is there any
strong indication that T-100 is closer to Weibull-distributed than is T?
• Data: Observed times (T) and censoring indicator
times <- c(143, 164, 188, 188, 190, 192, 206, 209, 213, 216,
220, 227, 230, 234, 246, 265, 304, 216, 244)
times
## [1] 143 164 188 188 190 192 206 209 213 216 220 227 230 234 246 265 304 216 244
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
length(censored)
## [1] 19
censored[c(18,19)]<-0
length(censored)
## [1] 19
censored
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
12
times_transformed <- times - 100
times_transformed
## [1] 43 64 88 88 90 92 106 109 113 116 120 127 130 134 146 165 204 116 144
250
200
150
0 1 2 3
Theoretical Quantiles
weibull_plot_raw
## $x
## [1] 0.02666825 0.08223810 0.14107860 0.20359896 0.27029033 0.34174929
## [7] 0.41871033 0.50209194 0.59306372 0.69314718 0.80437282 0.92953596
## [13] 1.07263680 1.23969089 1.44036158 1.69167601 2.02814825 2.53897387
## [19] 3.63758616
##
## $y
## [1] 143 164 188 188 190 192 206 209 213 216 216 220 227 230 234 244 246 265 304
13
weibull_plot_transformed <- qqplot(qweibull(ppoints(times_transformed), shape=1),
times_transformed,
main = "Weibull Probability Plot (T' = T - 100)",
xlab = "Theoretical Quantiles", ylab = "Observed Data")
abline(lm(times_transformed ~ qweibull(ppoints(times_transformed), shape=1)), col = "blue")
150
100
50
0 1 2 3
Theoretical Quantiles
weibull_plot_transformed
## $x
## [1] 0.02666825 0.08223810 0.14107860 0.20359896 0.27029033 0.34174929
## [7] 0.41871033 0.50209194 0.59306372 0.69314718 0.80437282 0.92953596
## [13] 1.07263680 1.23969089 1.44036158 1.69167601 2.02814825 2.53897387
## [19] 3.63758616
##
## $y
## [1] 43 64 88 88 90 92 106 109 113 116 116 120 127 130 134 144 146 165 204
length(censored)
## [1] 19
14
(2) Obtain a nonparametric .95 confidence interval for the median time to car-
cinoma, 1.50. Comment on the advantages and disadvantages of this esti- mate
over one based on a Weibull model.
## [1] 143 164 188 188 190 192 206 209 213 216 220 227 230 234 246
## [16] 265 304 216+ 244+
summary(km_fit)
## $quantile
## 50
## 216
15
##
## $lower
## 50
## 206
##
## $upper
## 50
## 265
• Kaplan-Meier plot
ggsurvplot(km_fit,
data = data.frame(times = times, censored = censored), # Add this line with the data
conf.int = TRUE,
xlab = "Time (Days)",
ylab = "Survival Probability",
title = "Kaplan-Meier Estimate of Carcinoma Time")
1.00
Survival Probability
0.75
0.50 +
0.25 +
0.00
0 100 200 300
Time (Days)
--------------------------------------------------------------------------------------------------------------------
#Problem-3.14
The following data are survival times for 121 breast cancer patients treated over the period 1929-1938, quoted
in Boag (1949). Times are in months, and asterisks denote censoring times.
16
0.3, 0.3*, 4, 5, 5.6, 6.2 6.3, 6.6 6.8, 7.4*, 7.5, 8.4, 8.4, 10.3, 11, 11.8, 12.2, 12.3, 13.5, 14.4, 14.4, 14.8, 15.5*,
15.7, 16.2, 16.3, 16.5, 16.8, 17.2, 17.3, 17.5, 17.9, 19.8 20.4, 20.9, 21, 21, 21.1, 23, 23.4*, 23.6, 24, 24, 27.9,
28.2, 29.1, 30, 31, 31, 32, 35, 35, 37*, 37*, 37*, 38, 38*, 38*, 39*, 39*, 40, 40*, 40*, 41, 41, 41*, 42, 43**,
43*, 43*, 44, 45*, 45*, 46*, 46, 47*, 48, 49*, 51, 51, 51*, 52, 54, 55*, 56, 57*, 58**, 59**, 60, 60”, 60*, 61*,
62*, 65*, 65*, 67, 67, 68*, 69, 78, 80, 83*, 88*, 89, 90, 93, 96**, 103, 105*, 109**, 109*, 111*, 115*, 117,
125, 126, 127, 129*, 129* 139*, 154*,
1. Calculate the Kaplan-Meier estimate of the survivor function. Estimate 1- and 5-year survival proba-
bilities and give a standard error for these esti- mates.
2. Group the data into a life table with 1-year intervals. Compare the 1- and 5-year survival probability
estimates with those obtained in part (a),
times <- c(0.3, 0.3, 4.0, 5.0, 5.6, 6.2, 6.3, 6.6, 6.8, 7.4, 7.5, 8.4, 8.4, 10.3, 11.0, 11.8, 12.2, 12.3
length(times)
## [1] 121
## [1] 121
censored[c(2,3,10,23,40,53,54,55,57,58,59,60,62,63,66,68,69,70,72,73,74,75,76,78,81,84,86,87,88,90,91,92
• Kaplan-Meier estimate
summary(km_fit)
17
## Call: survfit(formula = surv_obj ~ 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.3 121 1 0.992 0.00823 0.976 1.000
## 5.0 118 1 0.983 0.01169 0.961 1.000
## 5.6 117 1 0.975 0.01429 0.947 1.000
## 6.2 116 1 0.967 0.01646 0.935 0.999
## 6.3 115 1 0.958 0.01834 0.923 0.995
## 6.6 114 1 0.950 0.02001 0.911 0.990
## 6.8 113 1 0.941 0.02152 0.900 0.984
## 7.5 111 1 0.933 0.02294 0.889 0.979
## 8.4 110 2 0.916 0.02547 0.867 0.967
## 10.3 108 1 0.907 0.02660 0.857 0.961
## 11.0 107 1 0.899 0.02767 0.846 0.955
## 11.8 106 1 0.890 0.02868 0.836 0.948
## 12.2 105 1 0.882 0.02964 0.826 0.942
## 12.3 104 1 0.873 0.03054 0.816 0.935
## 13.5 103 1 0.865 0.03140 0.806 0.929
## 14.4 102 2 0.848 0.03300 0.786 0.915
## 14.8 100 1 0.840 0.03374 0.776 0.908
## 15.7 98 1 0.831 0.03446 0.766 0.901
## 16.2 97 1 0.822 0.03516 0.756 0.894
## 16.3 96 1 0.814 0.03582 0.747 0.887
## 16.5 95 1 0.805 0.03645 0.737 0.880
## 16.8 94 1 0.797 0.03706 0.727 0.873
## 17.2 93 1 0.788 0.03764 0.718 0.865
## 17.3 92 1 0.780 0.03819 0.708 0.858
## 17.5 91 1 0.771 0.03872 0.699 0.851
## 17.9 90 1 0.762 0.03923 0.689 0.843
## 19.8 89 1 0.754 0.03971 0.680 0.836
## 20.4 88 1 0.745 0.04017 0.671 0.828
## 20.9 87 1 0.737 0.04061 0.661 0.821
## 21.0 86 2 0.720 0.04144 0.643 0.806
## 21.1 84 1 0.711 0.04182 0.634 0.798
## 23.0 83 1 0.702 0.04218 0.624 0.790
## 23.6 81 1 0.694 0.04254 0.615 0.782
## 24.0 80 2 0.676 0.04321 0.597 0.767
## 27.9 78 1 0.668 0.04352 0.588 0.759
## 28.2 77 1 0.659 0.04381 0.579 0.751
## 29.1 76 1 0.650 0.04408 0.570 0.743
## 30.0 75 1 0.642 0.04434 0.560 0.735
## 31.0 74 2 0.624 0.04481 0.542 0.719
## 32.0 72 1 0.616 0.04502 0.534 0.711
## 35.0 71 2 0.598 0.04539 0.516 0.694
## 38.0 66 1 0.589 0.04560 0.506 0.686
## 40.0 61 1 0.580 0.04586 0.496 0.677
## 41.0 58 2 0.560 0.04641 0.476 0.658
## 42.0 55 1 0.550 0.04667 0.465 0.649
## 44.0 51 1 0.539 0.04698 0.454 0.639
## 48.0 45 1 0.527 0.04744 0.442 0.628
## 51.0 43 2 0.502 0.04829 0.416 0.606
## 52.0 40 1 0.490 0.04869 0.403 0.595
## 54.0 39 1 0.477 0.04903 0.390 0.584
## 56.0 37 1 0.464 0.04937 0.377 0.572
18
## 60.0 33 1 0.450 0.04984 0.362 0.559
## 78.0 22 1 0.430 0.05161 0.340 0.544
## 80.0 21 1 0.409 0.05305 0.317 0.528
## 89.0 18 1 0.387 0.05476 0.293 0.510
## 90.0 17 1 0.364 0.05606 0.269 0.492
## 126.0 6 1 0.303 0.07243 0.190 0.484
ggsurvplot(km_fit,
data = data.frame(times = times, censored = censored), conf.int = TRUE,
xlab = "Time (Months)", ylab = "Survival Probability",
title = "Kaplan-Meier Estimate of Survival",
risk.table = TRUE)
1.00 +++
Survival Probability
+
0.75 +
++++++
0.50 +++++
+++++++++++
+ + ++ ++ ++ ++ +
++ + +
0.25
0.00
0 50 100 150
Time (Months)
Number at risk
Strata
All 121 43 14 1
0 50 100 150
Time (Months)
• Calculate the 1-year (12 months) and 5-year (60 months) survival probabilities
km_fit$surv
19
## [36] 0.7024766 0.6938041 0.6764590 0.6677864 0.6591139 0.6504413 0.6417688
## [43] 0.6244237 0.6157511 0.5984060 0.5984060 0.5893393 0.5893393 0.5796780
## [50] 0.5596891 0.5495129 0.5495129 0.5387381 0.5387381 0.5387381 0.5387381
## [57] 0.5267662 0.5267662 0.5022654 0.4897088 0.4771522 0.4771522 0.4642562
## [64] 0.4642562 0.4642562 0.4642562 0.4501878 0.4501878 0.4501878 0.4501878
## [71] 0.4501878 0.4501878 0.4501878 0.4297247 0.4092616 0.4092616 0.4092616
## [78] 0.3865249 0.3637881 0.3637881 0.3637881 0.3637881 0.3637881 0.3637881
## [85] 0.3637881 0.3637881 0.3637881 0.3637881 0.3031568 0.3031568 0.3031568
## [92] 0.3031568 0.3031568
km_fit$time
## [1] 0.3 4.0 5.0 5.6 6.2 6.3 6.6 6.8 7.4 7.5 8.4 10.3
## [13] 11.0 11.8 12.2 12.3 13.5 14.4 14.8 15.5 15.7 16.2 16.3 16.5
## [25] 16.8 17.2 17.3 17.5 17.9 19.8 20.4 20.9 21.0 21.1 23.0 23.4
## [37] 23.6 24.0 27.9 28.2 29.1 30.0 31.0 32.0 35.0 37.0 38.0 39.0
## [49] 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 51.0 52.0
## [61] 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 62.0 65.0 67.0 68.0
## [73] 69.0 78.0 80.0 83.0 88.0 89.0 90.0 93.0 96.0 103.0 105.0 109.0
## [85] 111.0 115.0 117.0 125.0 126.0 127.0 129.0 139.0 154.0
## numeric(0)
## [1] 0.4501878
(2) Group the data into a life table with 1-year intervals. Compare the 1- and
5-year survival probability estimates with those obtained in part (a),
• Using the survfit object to calculate the survival at specific time points
• This can be used to generate life table estimates
20