Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Assignment

Uploaded by

rhrakib216
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment

Uploaded by

rhrakib216
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ASSIGNMENT

HUMAUN FARID SOHAG

2025-02-17

#Problem-3.1
The data below are remission times, in weeks, for a group of 30 patients with leukemia who received similar
treatment. Asterisks denote censoring times.
1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14, 18, 19, 24, 26, 29, 31*, 42, 45*, 50*, 57, 60, 71*, 85*, 91.

1. Obtain and plot the Kaplan-Meier estimate Ŝ(t) of the survivor function for remission time.
2. Obtain approximate .95 confidence intervals for the median remission time and for the probability that
remission lasts over 26 weeks.

3. Plot log(log Ŝ(t)) and log  (t) on the same graph, where Ĥ(t) is the Nelson-Aalen estimate. Is there
much difference?

# Load necessary libraries


#install.packages("survival")
#install.packages("survminer")
library(survival)
library(survminer)
library(ggplot2)
library(MASS)

(1) Obtain and plot the Kaplan-Meier estimate Ŝ(t) of the survivor function for
remission time.

• Define remission times

remission_times <- c(1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14,


18, 19, 24, 26, 29, 31, 42, 45, 50, 57, 60, 71, 85, 91)
remission_times

## [1] 1 1 2 4 4 6 6 6 7 8 9 9 10 12 13 14 18 19 24 26 29 31 42 45 50
## [26] 57 60 71 85 91

• Define censoring status (1 = event, 0 = censored)

1
censoring <- rep(1, length(remission_times))
censoring

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

censoring[c(22, 24, 25, 28, 29)] <- 0 # Censored at 31, 45, 50, 71, and 85 weeks
censoring

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 0 1

• Create a survival object

surv_obj <- Surv(time = remission_times, event = censoring)


surv_obj

## [1] 1 1 2 4 4 6 6 6 7 8 9 9 10 12 13 14 18 19 24
## [20] 26 29 31+ 42 45+ 50+ 57 60 71+ 85+ 91

• Kaplan-Meier estimate

km_fit <- survfit(surv_obj ~ 1)


km_fit

## Call: survfit(formula = surv_obj ~ 1)


##
## n events median 0.95LCL 0.95UCL
## [1,] 30 25 13.5 9 42

summary(km_fit)

## Call: survfit(formula = surv_obj ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 30 2 0.933 0.0455 0.8482 1.000
## 2 28 1 0.900 0.0548 0.7988 1.000
## 4 27 2 0.833 0.0680 0.7101 0.978
## 6 25 3 0.733 0.0807 0.5910 0.910
## 7 22 1 0.700 0.0837 0.5538 0.885
## 8 21 1 0.667 0.0861 0.5176 0.859
## 9 20 2 0.600 0.0894 0.4480 0.804
## 10 18 1 0.567 0.0905 0.4144 0.775
## 12 17 1 0.533 0.0911 0.3816 0.745
## 13 16 1 0.500 0.0913 0.3496 0.715
## 14 15 1 0.467 0.0911 0.3183 0.684
## 18 14 1 0.433 0.0905 0.2878 0.652
## 19 13 1 0.400 0.0894 0.2581 0.620
## 24 12 1 0.367 0.0880 0.2291 0.587
## 26 11 1 0.333 0.0861 0.2010 0.553
## 29 10 1 0.300 0.0837 0.1737 0.518
## 42 8 1 0.263 0.0812 0.1432 0.481
## 57 5 1 0.210 0.0801 0.0994 0.444
## 60 4 1 0.158 0.0754 0.0617 0.402
## 91 1 1 0.000 NaN NA NA

2
• Plot the Kaplan-Meier estimate

ggsurvplot(km_fit, data = data.frame(remission_times, censoring),


conf.int = TRUE,
title = "Kaplan-Meier Estimate of Survivor Function",
xlab = "Time (weeks)",
ylab = "Survival Probability",
risk.table = TRUE)

Kaplan−Meier Estimate of Survivor Function


Strata + All

1.00
Survival Probability

0.75

0.50

0.25
+ + +
+ +
0.00
0 25 50 75 100
Time (weeks)
Number at risk
Strata

All 30 11 6 2 0
0 25 50 75 100
Time (weeks)

(2) Obtain approximate .95 confidence intervals for the median remission time
and # for the probability that remission lasts over 26 weeks.

• Confidence interval for median remission time

ci_median <- summary(km_fit)$table[c("median", "0.95LCL", "0.95UCL")]


ci_median

## median 0.95LCL 0.95UCL


## 13.5 9.0 42.0

• Probability that remission lasts over 26 weeks

3
prob_26 <- summary(km_fit, times = 26)$surv
prob_26

## [1] 0.3333333

ci_26 <- summary(km_fit, times = 26)$lower


ci_26

## [1] 0.2009553

• Print results

cat("Approximate 95% CI for median remission time:\n")

## Approximate 95% CI for median remission time:

print(ci_median)

## median 0.95LCL 0.95UCL


## 13.5 9.0 42.0

cat("\nEstimated probability of remission lasting over 26 weeks: ", prob_26, "\n")

##
## Estimated probability of remission lasting over 26 weeks: 0.3333333

cat("95% CI for P(T > 26 weeks): ", ci_26, "\n")

## 95% CI for P(T > 26 weeks): 0.2009553

(c) Plot log(log Ŝ(t)) and log  (t) on the same graph, where Ĥ(t) is the Nelson-
Aalen estimate. Is there much difference?

• Nelson-Aalen estimate

na_fit <- survfit(coxph(surv_obj ~ 1), type = "aalen")


na_fit

## Call: survfit(formula = coxph(surv_obj ~ 1), type = "aalen")


##
## n events median 0.95LCL 0.95UCL
## [1,] 30 25 14 9 42

• Extract cumulative hazard estimates

4
na_df <- data.frame(time = na_fit$time, H_t = cumsum(na_fit$n.event / na_fit$n.risk))
na_df

## time H_t
## 1 1 0.06666667
## 2 2 0.10238095
## 3 4 0.17645503
## 4 6 0.29645503
## 5 7 0.34190957
## 6 8 0.38952862
## 7 9 0.48952862
## 8 10 0.54508418
## 9 12 0.60390770
## 10 13 0.66640770
## 11 14 0.73307437
## 12 18 0.80450294
## 13 19 0.88142602
## 14 24 0.96475935
## 15 26 1.05566844
## 16 29 1.15566844
## 17 31 1.15566844
## 18 42 1.28066844
## 19 45 1.28066844
## 20 50 1.28066844
## 21 57 1.48066844
## 22 60 1.73066844
## 23 71 1.73066844
## 24 85 1.73066844
## 25 91 2.73066844

• Plot log(-log(S(t))) and log(H(t))

km_df <- data.frame(time = km_fit$time, S_t = km_fit$surv)


km_df

## time S_t
## 1 1 0.9333333
## 2 2 0.9000000
## 3 4 0.8333333
## 4 6 0.7333333
## 5 7 0.7000000
## 6 8 0.6666667
## 7 9 0.6000000
## 8 10 0.5666667
## 9 12 0.5333333
## 10 13 0.5000000
## 11 14 0.4666667
## 12 18 0.4333333
## 13 19 0.4000000
## 14 24 0.3666667
## 15 26 0.3333333
## 16 29 0.3000000

5
## 17 31 0.3000000
## 18 42 0.2625000
## 19 45 0.2625000
## 20 50 0.2625000
## 21 57 0.2100000
## 22 60 0.1575000
## 23 71 0.1575000
## 24 85 0.1575000
## 25 91 0.0000000

km_df <- km_df[km_df$S_t > 0, ] # Remove zero survival values to avoid log(0)
km_df

## time S_t
## 1 1 0.9333333
## 2 2 0.9000000
## 3 4 0.8333333
## 4 6 0.7333333
## 5 7 0.7000000
## 6 8 0.6666667
## 7 9 0.6000000
## 8 10 0.5666667
## 9 12 0.5333333
## 10 13 0.5000000
## 11 14 0.4666667
## 12 18 0.4333333
## 13 19 0.4000000
## 14 24 0.3666667
## 15 26 0.3333333
## 16 29 0.3000000
## 17 31 0.3000000
## 18 42 0.2625000
## 19 45 0.2625000
## 20 50 0.2625000
## 21 57 0.2100000
## 22 60 0.1575000
## 23 71 0.1575000
## 24 85 0.1575000

ggplot() +
geom_line(data = km_df, aes(x = log(time), y = log(-log(S_t))), color = "blue", size = 1) +
geom_line(data = na_df, aes(x = log(time), y = log(H_t)), color = "red", size = 1) +
labs(title = "Comparison of Log-Transformed Survival and Hazard Functions",
x = "log(Time)", y = "log(-log(S(t))) / log(H(t))") +
theme_minimal()

## Warning: Using ‘size‘ aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use ‘linewidth‘ instead.
## This warning is displayed once every 8 hours.
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

6
Comparison of Log−Transformed Survival and Hazard Functions
1

0
log(−log(S(t))) / log(H(t))

−1

−2

0 1 2 3 4
log(Time)

--------------------------------------------------------------------------------------------------------------------

#Problem-3.2
The data below show survival times (in months) of patients with Hodgkin’s dis- ease who were treated
with nitrogen mustards (Bartolucci and Dickey, 1977). Group A patients received little or no prior therapy,
whereas Group B patients received heavy prior therapy. Starred observations are censoring times.
Group A 1.25, 1.41, 4.98, 5.25, 5.38, 6.92, 8.89, 10.98, 11.18, 13.11, 13.21, 16.33, 19.77, 21.08, 21.84*, 22.07,
31.38*, 32,62*, 37.18*, 42.92. Group B 1,05, 2.92, 3.61, 4,20, 4.49, 6.72, 7.31, 9.08, 9.11, 14.49*, 16.85,
18.82*, 26.59*, 30.26*, 41.34*.

1. Obtain and compare Kaplan-Meier estimates for the two groups. Does there appear to be a difference
in the 1-year survival probability for the two types of patients? Give confidence limits for S(1) and for
the median survival time 1.50 for each group.
2. Use plots of the Nelson-Aalen estimate (t) to examine and compare the two life distributions.
3. Do any parametric models whereby one tions suggest themselves? might compare the two distribu-

7
(1) Obtain and compare Kaplan-Meier estimates for the two groups. Does there
appear to be a difference in the 1-year survival probability for the two types of
patients? Give confidence limits for S(1) and for the median survival time 1.50
for each group.

• Survival times for Group A and B

group_A <- c(1.25, 1.41, 4.98, 5.25, 5.38, 6.92, 8.89, 10.98, 11.18, 13.11,
13.21, 16.33, 19.77, 21.08, 21.84, 22.07, 31.38, 32.62, 37.18, 42.92)
censor_A <- rep(1, length(group_A))
censor_A

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

length(censor_A)

## [1] 20

censor_A[c(15,17,18,19)]<-0 # 1 = event, 0 = censored


censor_A

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 1

group_B <- c(1.05, 2.92, 3.61, 4.20, 4.49, 6.72, 7.31, 9.08, 9.11, 14.49,16.85,18.82,26.59,30.26,41.34)
length(group_B)

## [1] 15

censor_B <- rep(1, length(group_B)) # Last value is censored


censor_B[c(10,12,13,14,15)]<-0
censor_B

## [1] 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0

• Create survival objects

surv_A <- Surv(group_A, censor_A);surv_A

## [1] 1.25 1.41 4.98 5.25 5.38 6.92 8.89 10.98 11.18 13.11
## [11] 13.21 16.33 19.77 21.08 21.84+ 22.07 31.38+ 32.62+ 37.18+ 42.92

surv_B <- Surv(group_B, censor_B);surv_B

## [1] 1.05 2.92 3.61 4.20 4.49 6.72 7.31 9.08 9.11 14.49+
## [11] 16.85 18.82+ 26.59+ 30.26+ 41.34+

• Fit Kaplan-Meier estimates

8
km_A <- survfit(surv_A ~ 1);km_A

## Call: survfit(formula = surv_A ~ 1)


##
## n events median 0.95LCL 0.95UCL
## [1,] 20 16 13.2 8.89 NA

summary(km_A)

## Call: survfit(formula = surv_A ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1.25 20 1 0.95 0.0487 0.859 1.000
## 1.41 19 1 0.90 0.0671 0.778 1.000
## 4.98 18 1 0.85 0.0798 0.707 1.000
## 5.25 17 1 0.80 0.0894 0.643 0.996
## 5.38 16 1 0.75 0.0968 0.582 0.966
## 6.92 15 1 0.70 0.1025 0.525 0.933
## 8.89 14 1 0.65 0.1067 0.471 0.897
## 10.98 13 1 0.60 0.1095 0.420 0.858
## 11.18 12 1 0.55 0.1112 0.370 0.818
## 13.11 11 1 0.50 0.1118 0.323 0.775
## 13.21 10 1 0.45 0.1112 0.277 0.731
## 16.33 9 1 0.40 0.1095 0.234 0.684
## 19.77 8 1 0.35 0.1067 0.193 0.636
## 21.08 7 1 0.30 0.1025 0.154 0.586
## 22.07 5 1 0.24 0.0980 0.108 0.534
## 42.92 1 1 0.00 NaN NA NA

km_B <- survfit(surv_B ~ 1);km_B

## Call: survfit(formula = surv_B ~ 1)


##
## n events median 0.95LCL 0.95UCL
## [1,] 15 10 9.08 4.49 NA

summary(km_B)

## Call: survfit(formula = surv_B ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1.05 15 1 0.933 0.0644 0.815 1.000
## 2.92 14 1 0.867 0.0878 0.711 1.000
## 3.61 13 1 0.800 0.1033 0.621 1.000
## 4.20 12 1 0.733 0.1142 0.540 0.995
## 4.49 11 1 0.667 0.1217 0.466 0.953
## 6.72 10 1 0.600 0.1265 0.397 0.907
## 7.31 9 1 0.533 0.1288 0.332 0.856
## 9.08 8 1 0.467 0.1288 0.272 0.802
## 9.11 7 1 0.400 0.1265 0.215 0.743
## 16.85 5 1 0.320 0.1239 0.150 0.684

9
• Plot Kaplan-Meier curves

ggsurvplot_combine(
list("Group A" = km_A, "Group B" = km_B),
conf.int = TRUE, risk.table = TRUE,
xlab = "Time (Months)", ylab = "Survival Probability",
title = "Kaplan-Meier Estimates for Group A and B",
legend.title = "Groups"
)

Kaplan−Meier Estimates for Group A and B


Groups + Group A::All + Group B::All

1.00
Survival Probability

0.75

0.50
+
+ + + + +
0.25 ++ +

0.00
0 10 20 30 40
Time (Months)
Number at risk
Groups

Group A::All 20 13 7 4 1
Group B::All 15 6 3 2 1
0 10 20 30 40
Time (Months)

• Compute 95% confidence intervals for S(1) and median survival time

summary(km_A, times = 1.50) # Confidence limits for Group A at 1.50 months

## Call: survfit(formula = surv_A ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1.5 18 2 0.9 0.0671 0.778 1

summary(km_B, times = 1.50) # Confidence limits for Group B at 1.50 months

## Call: survfit(formula = surv_B ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1.5 14 1 0.933 0.0644 0.815 1

10
(2) Use plots of the Nelson-Aalen estimate (t) to examine and compare the two
life distributions.
• Nelson-Aalen estimates for cumulative hazard

na_A <- survfit(surv_A ~ 1, type = "fleming-harrington");na_A

## Call: survfit(formula = surv_A ~ 1, type = "fleming-harrington")


##
## n events median 0.95LCL 0.95UCL
## [1,] 20 16 13.2 8.89 NA

na_B <- survfit(surv_B ~ 1, type = "fleming-harrington");na_B

## Call: survfit(formula = surv_B ~ 1, type = "fleming-harrington")


##
## n events median 0.95LCL 0.95UCL
## [1,] 15 10 9.08 4.49 NA

• Plot Nelson-Aalen estimates

plot(na_A$time, -log(km_A$surv), type = "s", col = "blue", lwd = 2,


xlab = "Time (Months)", ylab = "Cumulative Hazard",
main = "Nelson-Aalen Estimates")
lines(na_B$time, -log(km_B$surv), type = "s", col = "red", lwd = 2)
legend("topleft", legend = c("Group A", "Group B"), col = c("blue", "red"), lwd = 2)

Nelson−Aalen Estimates

Group A
Group B
1.2
Cumulative Hazard

0.8
0.4
0.0

0 10 20 30 40

Time (Months)

11
--------------------------------------------------------------------------------------------------------------------

#Problem-3.5
Pike (1966) gave results of a laboratory experiment in which 19 female rats were painted with the carcinogen
DMBA. The number of days T until the appearance of a carcinoma was of interest, and the data gave the
following times (asterisks denote censoring limes):
143, 164, 188, 188, 190, 192, 206, 209, 213, 216, 220, 227, 230, 234, 246, 265,304, 216, 244*

1. It was thought that carcinomas could not appear before some threshold time > 0, so a Weibull model
for T’ Ty was considered. Give twa Weibull probability plots, using (1) the raw data (1-values), and
(2) the values 1-100. Is there any strong indication that T-100 is closer to Weibull-distributed than is
T?
2. Obtain a nonparametric .95 confidence interval for the median time to car- cinoma, 1.50. Comment
on the advantages and disadvantages of this esti- mate over one based on a Weibull model.

(1) It was thought that carcinomas could not appear before some threshold time
> 0, so a Weibull model for T’ Ty was considered. Give twa Weibull probability
plots, using (1) the raw data (1-values), and (2) the values 1-100. Is there any
strong indication that T-100 is closer to Weibull-distributed than is T?
• Data: Observed times (T) and censoring indicator

times <- c(143, 164, 188, 188, 190, 192, 206, 209, 213, 216,
220, 227, 230, 234, 246, 265, 304, 216, 244)
times

## [1] 143 164 188 188 190 192 206 209 213 216 220 227 230 234 246 265 304 216 244

censored <- rep(1,length(times)) # Last two are censored


censored

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

length(censored)

## [1] 19

censored[c(18,19)]<-0
length(censored)

## [1] 19

censored

## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0

• Transform data for Weibull plot (T’ = T - 100)

12
times_transformed <- times - 100
times_transformed

## [1] 43 64 88 88 90 92 106 109 113 116 120 127 130 134 146 165 204 116 144

• Weibull probability plot for raw data

weibull_plot_raw <- qqplot(qweibull(ppoints(times), shape=1), times,


main = "Weibull Probability Plot (Raw Data)",
xlab = "Theoretical Quantiles", ylab = "Observed Data")
abline(lm(times ~ qweibull(ppoints(times), shape=1)), col = "red") # Fit line

Weibull Probability Plot (Raw Data)


300
Observed Data

250
200
150

0 1 2 3

Theoretical Quantiles

weibull_plot_raw

## $x
## [1] 0.02666825 0.08223810 0.14107860 0.20359896 0.27029033 0.34174929
## [7] 0.41871033 0.50209194 0.59306372 0.69314718 0.80437282 0.92953596
## [13] 1.07263680 1.23969089 1.44036158 1.69167601 2.02814825 2.53897387
## [19] 3.63758616
##
## $y
## [1] 143 164 188 188 190 192 206 209 213 216 216 220 227 230 234 244 246 265 304

• Weibull probability plot for transformed data (T’ = T - 100)

13
weibull_plot_transformed <- qqplot(qweibull(ppoints(times_transformed), shape=1),
times_transformed,
main = "Weibull Probability Plot (T' = T - 100)",
xlab = "Theoretical Quantiles", ylab = "Observed Data")
abline(lm(times_transformed ~ qweibull(ppoints(times_transformed), shape=1)), col = "blue")

Weibull Probability Plot (T' = T − 100)


200
Observed Data

150
100
50

0 1 2 3

Theoretical Quantiles

weibull_plot_transformed

## $x
## [1] 0.02666825 0.08223810 0.14107860 0.20359896 0.27029033 0.34174929
## [7] 0.41871033 0.50209194 0.59306372 0.69314718 0.80437282 0.92953596
## [13] 1.07263680 1.23969089 1.44036158 1.69167601 2.02814825 2.53897387
## [19] 3.63758616
##
## $y
## [1] 43 64 88 88 90 92 106 109 113 116 116 120 127 130 134 144 146 165 204

length(censored)

## [1] 19

14
(2) Obtain a nonparametric .95 confidence interval for the median time to car-
cinoma, 1.50. Comment on the advantages and disadvantages of this esti- mate
over one based on a Weibull model.

• Kaplan-Meier estimation for median survival

surv_obj <- Surv(times, censored)


surv_obj

## [1] 143 164 188 188 190 192 206 209 213 216 220 227 230 234 246
## [16] 265 304 216+ 244+

km_fit <- survfit(surv_obj ~ 1)


km_fit

## Call: survfit(formula = surv_obj ~ 1)


##
## n events median 0.95LCL 0.95UCL
## [1,] 19 17 216 206 265

summary(km_fit)

## Call: survfit(formula = surv_obj ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 143 19 1 0.9474 0.0512 0.8521 1.000
## 164 18 1 0.8947 0.0704 0.7669 1.000
## 188 17 2 0.7895 0.0935 0.6259 0.996
## 190 15 1 0.7368 0.1010 0.5632 0.964
## 192 14 1 0.6842 0.1066 0.5041 0.929
## 206 13 1 0.6316 0.1107 0.4480 0.890
## 209 12 1 0.5789 0.1133 0.3946 0.850
## 213 11 1 0.5263 0.1145 0.3435 0.806
## 216 10 1 0.4737 0.1145 0.2949 0.761
## 220 8 1 0.4145 0.1145 0.2412 0.712
## 227 7 1 0.3553 0.1124 0.1911 0.661
## 230 6 1 0.2961 0.1082 0.1447 0.606
## 234 5 1 0.2368 0.1015 0.1023 0.548
## 246 3 1 0.1579 0.0934 0.0495 0.504
## 265 2 1 0.0789 0.0728 0.0130 0.481
## 304 1 1 0.0000 NaN NA NA

• 95% confidence interval for median time

km_ci <- quantile(km_fit, probs = c(0.5), conf.int = TRUE)


km_ci

## $quantile
## 50
## 216

15
##
## $lower
## 50
## 206
##
## $upper
## 50
## 265

• Kaplan-Meier plot

ggsurvplot(km_fit,
data = data.frame(times = times, censored = censored), # Add this line with the data
conf.int = TRUE,
xlab = "Time (Days)",
ylab = "Survival Probability",
title = "Kaplan-Meier Estimate of Carcinoma Time")

Kaplan−Meier Estimate of Carcinoma Time


Strata + All

1.00
Survival Probability

0.75

0.50 +

0.25 +

0.00
0 100 200 300
Time (Days)
--------------------------------------------------------------------------------------------------------------------

#Problem-3.14
The following data are survival times for 121 breast cancer patients treated over the period 1929-1938, quoted
in Boag (1949). Times are in months, and asterisks denote censoring times.

16
0.3, 0.3*, 4, 5, 5.6, 6.2 6.3, 6.6 6.8, 7.4*, 7.5, 8.4, 8.4, 10.3, 11, 11.8, 12.2, 12.3, 13.5, 14.4, 14.4, 14.8, 15.5*,
15.7, 16.2, 16.3, 16.5, 16.8, 17.2, 17.3, 17.5, 17.9, 19.8 20.4, 20.9, 21, 21, 21.1, 23, 23.4*, 23.6, 24, 24, 27.9,
28.2, 29.1, 30, 31, 31, 32, 35, 35, 37*, 37*, 37*, 38, 38*, 38*, 39*, 39*, 40, 40*, 40*, 41, 41, 41*, 42, 43**,
43*, 43*, 44, 45*, 45*, 46*, 46, 47*, 48, 49*, 51, 51, 51*, 52, 54, 55*, 56, 57*, 58**, 59**, 60, 60”, 60*, 61*,
62*, 65*, 65*, 67, 67, 68*, 69, 78, 80, 83*, 88*, 89, 90, 93, 96**, 103, 105*, 109**, 109*, 111*, 115*, 117,
125, 126, 127, 129*, 129* 139*, 154*,

1. Calculate the Kaplan-Meier estimate of the survivor function. Estimate 1- and 5-year survival proba-
bilities and give a standard error for these esti- mates.
2. Group the data into a life table with 1-year intervals. Compare the 1- and 5-year survival probability
estimates with those obtained in part (a),

(1) Calculate the Kaplan-Meier estimate of the survivor function. Estimate 1-


and 5-year survival probabilities and give a standard error for these esti- mates.

• Define the survival data (times and censoring status)

times <- c(0.3, 0.3, 4.0, 5.0, 5.6, 6.2, 6.3, 6.6, 6.8, 7.4, 7.5, 8.4, 8.4, 10.3, 11.0, 11.8, 12.2, 12.3
length(times)

## [1] 121

censored <- rep(1,length(times))


length(censored)

## [1] 121

censored[c(2,3,10,23,40,53,54,55,57,58,59,60,62,63,66,68,69,70,72,73,74,75,76,78,81,84,86,87,88,90,91,92

• Create a survival object

surv_obj <- Surv(times, censored)

• Kaplan-Meier estimate

km_fit <- survfit(surv_obj ~ 1)


km_fit

## Call: survfit(formula = surv_obj ~ 1)


##
## n events median 0.95LCL 0.95UCL
## [1,] 121 65 52 40 90

summary(km_fit)

17
## Call: survfit(formula = surv_obj ~ 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.3 121 1 0.992 0.00823 0.976 1.000
## 5.0 118 1 0.983 0.01169 0.961 1.000
## 5.6 117 1 0.975 0.01429 0.947 1.000
## 6.2 116 1 0.967 0.01646 0.935 0.999
## 6.3 115 1 0.958 0.01834 0.923 0.995
## 6.6 114 1 0.950 0.02001 0.911 0.990
## 6.8 113 1 0.941 0.02152 0.900 0.984
## 7.5 111 1 0.933 0.02294 0.889 0.979
## 8.4 110 2 0.916 0.02547 0.867 0.967
## 10.3 108 1 0.907 0.02660 0.857 0.961
## 11.0 107 1 0.899 0.02767 0.846 0.955
## 11.8 106 1 0.890 0.02868 0.836 0.948
## 12.2 105 1 0.882 0.02964 0.826 0.942
## 12.3 104 1 0.873 0.03054 0.816 0.935
## 13.5 103 1 0.865 0.03140 0.806 0.929
## 14.4 102 2 0.848 0.03300 0.786 0.915
## 14.8 100 1 0.840 0.03374 0.776 0.908
## 15.7 98 1 0.831 0.03446 0.766 0.901
## 16.2 97 1 0.822 0.03516 0.756 0.894
## 16.3 96 1 0.814 0.03582 0.747 0.887
## 16.5 95 1 0.805 0.03645 0.737 0.880
## 16.8 94 1 0.797 0.03706 0.727 0.873
## 17.2 93 1 0.788 0.03764 0.718 0.865
## 17.3 92 1 0.780 0.03819 0.708 0.858
## 17.5 91 1 0.771 0.03872 0.699 0.851
## 17.9 90 1 0.762 0.03923 0.689 0.843
## 19.8 89 1 0.754 0.03971 0.680 0.836
## 20.4 88 1 0.745 0.04017 0.671 0.828
## 20.9 87 1 0.737 0.04061 0.661 0.821
## 21.0 86 2 0.720 0.04144 0.643 0.806
## 21.1 84 1 0.711 0.04182 0.634 0.798
## 23.0 83 1 0.702 0.04218 0.624 0.790
## 23.6 81 1 0.694 0.04254 0.615 0.782
## 24.0 80 2 0.676 0.04321 0.597 0.767
## 27.9 78 1 0.668 0.04352 0.588 0.759
## 28.2 77 1 0.659 0.04381 0.579 0.751
## 29.1 76 1 0.650 0.04408 0.570 0.743
## 30.0 75 1 0.642 0.04434 0.560 0.735
## 31.0 74 2 0.624 0.04481 0.542 0.719
## 32.0 72 1 0.616 0.04502 0.534 0.711
## 35.0 71 2 0.598 0.04539 0.516 0.694
## 38.0 66 1 0.589 0.04560 0.506 0.686
## 40.0 61 1 0.580 0.04586 0.496 0.677
## 41.0 58 2 0.560 0.04641 0.476 0.658
## 42.0 55 1 0.550 0.04667 0.465 0.649
## 44.0 51 1 0.539 0.04698 0.454 0.639
## 48.0 45 1 0.527 0.04744 0.442 0.628
## 51.0 43 2 0.502 0.04829 0.416 0.606
## 52.0 40 1 0.490 0.04869 0.403 0.595
## 54.0 39 1 0.477 0.04903 0.390 0.584
## 56.0 37 1 0.464 0.04937 0.377 0.572

18
## 60.0 33 1 0.450 0.04984 0.362 0.559
## 78.0 22 1 0.430 0.05161 0.340 0.544
## 80.0 21 1 0.409 0.05305 0.317 0.528
## 89.0 18 1 0.387 0.05476 0.293 0.510
## 90.0 17 1 0.364 0.05606 0.269 0.492
## 126.0 6 1 0.303 0.07243 0.190 0.484

• Plot the Kaplan-Meier curve

ggsurvplot(km_fit,
data = data.frame(times = times, censored = censored), conf.int = TRUE,
xlab = "Time (Months)", ylab = "Survival Probability",
title = "Kaplan-Meier Estimate of Survival",
risk.table = TRUE)

Kaplan−Meier Estimate of Survival


Strata + All

1.00 +++
Survival Probability

+
0.75 +
++++++
0.50 +++++
+++++++++++
+ + ++ ++ ++ ++ +
++ + +
0.25

0.00
0 50 100 150
Time (Months)
Number at risk
Strata

All 121 43 14 1
0 50 100 150
Time (Months)

• Calculate the 1-year (12 months) and 5-year (60 months) survival probabilities

km_fit$surv

## [1] 0.9917355 0.9917355 0.9833310 0.9749265 0.9665219 0.9581174 0.9497128


## [8] 0.9413083 0.9413083 0.9328281 0.9158675 0.9073873 0.8989070 0.8904268
## [15] 0.8819465 0.8734663 0.8649860 0.8480255 0.8395452 0.8395452 0.8309785
## [22] 0.8224117 0.8138449 0.8052781 0.7967113 0.7881445 0.7795777 0.7710109
## [29] 0.7624442 0.7538774 0.7453106 0.7367438 0.7196102 0.7110434 0.7024766

19
## [36] 0.7024766 0.6938041 0.6764590 0.6677864 0.6591139 0.6504413 0.6417688
## [43] 0.6244237 0.6157511 0.5984060 0.5984060 0.5893393 0.5893393 0.5796780
## [50] 0.5596891 0.5495129 0.5495129 0.5387381 0.5387381 0.5387381 0.5387381
## [57] 0.5267662 0.5267662 0.5022654 0.4897088 0.4771522 0.4771522 0.4642562
## [64] 0.4642562 0.4642562 0.4642562 0.4501878 0.4501878 0.4501878 0.4501878
## [71] 0.4501878 0.4501878 0.4501878 0.4297247 0.4092616 0.4092616 0.4092616
## [78] 0.3865249 0.3637881 0.3637881 0.3637881 0.3637881 0.3637881 0.3637881
## [85] 0.3637881 0.3637881 0.3637881 0.3637881 0.3031568 0.3031568 0.3031568
## [92] 0.3031568 0.3031568

km_fit$time

## [1] 0.3 4.0 5.0 5.6 6.2 6.3 6.6 6.8 7.4 7.5 8.4 10.3
## [13] 11.0 11.8 12.2 12.3 13.5 14.4 14.8 15.5 15.7 16.2 16.3 16.5
## [25] 16.8 17.2 17.3 17.5 17.9 19.8 20.4 20.9 21.0 21.1 23.0 23.4
## [37] 23.6 24.0 27.9 28.2 29.1 30.0 31.0 32.0 35.0 37.0 38.0 39.0
## [49] 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 51.0 52.0
## [61] 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 62.0 65.0 67.0 68.0
## [73] 69.0 78.0 80.0 83.0 88.0 89.0 90.0 93.0 96.0 103.0 105.0 109.0
## [85] 111.0 115.0 117.0 125.0 126.0 127.0 129.0 139.0 154.0

km_fit$surv[km_fit$time == 12] # 1-year survival

## numeric(0)

km_fit$surv[km_fit$time == 60] # 5-year survival

## [1] 0.4501878

(2) Group the data into a life table with 1-year intervals. Compare the 1- and
5-year survival probability estimates with those obtained in part (a),

• Using the survfit object to calculate the survival at specific time points
• This can be used to generate life table estimates

summary(km_fit, times = seq(1, 5, by = 1)) # For 1-year intervals

## Call: survfit(formula = surv_obj ~ 1)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 119 1 0.992 0.00823 0.976 1
## 2 119 0 0.992 0.00823 0.976 1
## 3 119 0 0.992 0.00823 0.976 1
## 4 119 0 0.992 0.00823 0.976 1
## 5 118 1 0.983 0.01169 0.961 1

20

You might also like