Exercise 1: Life tables and Kaplan-Meier estimates of survival
(a) Hand calculation: Life table and Kaplan-Meier estimates of survival
The results are contained in an Excel file and are also shown in the R output below.
(b) Using R to validate the hand calculations done in part 1 (a)
First, load the biostat3 library for the dataset ,
survminer for plots and knitr for nice html
output:
Following are the life table estimates. Note that in the lectures, when we estimated all-cause survival, there were 8 deaths in the first interval. One of these died of a cause other than cancer so in the cause-specific survival analysis we see that there are 7 ‘deaths’ and 1 censoring (Stata uses the term ‘lost’ for lost to follow-up) in the first interval.
lifetab2(Surv(floor(surv_yy), status == "Dead: cancer")~1,
         colon_sample, breaks=0:10) |>
    kable("html", digits=2)| tstart | tstop | nsubs | nlost | nrisk | nevent | surv | hazard | se.surv | se.pdf | se.hazard | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0-1 | 0 | 1 | 35 | 1 | 34.5 | 7 | 1.00 | 0.20 | 0.23 | 0.00 | 0.07 | 0.08 | 
| 1-2 | 1 | 2 | 27 | 3 | 25.5 | 1 | 0.80 | 0.03 | 0.04 | 0.07 | 0.03 | 0.04 | 
| 2-3 | 2 | 3 | 23 | 4 | 21.0 | 5 | 0.77 | 0.18 | 0.27 | 0.07 | 0.07 | 0.12 | 
| 3-4 | 3 | 4 | 14 | 1 | 13.5 | 2 | 0.58 | 0.09 | 0.16 | 0.09 | 0.06 | 0.11 | 
| 4-5 | 4 | 5 | 11 | 1 | 10.5 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN | 
| 5-6 | 5 | 6 | 10 | 0 | 10.0 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN | 
| 6-7 | 6 | 7 | 10 | 3 | 8.5 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN | 
| 7-8 | 7 | 8 | 7 | 1 | 6.5 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN | 
| 8-9 | 8 | 9 | 6 | 4 | 4.0 | 1 | 0.50 | 0.12 | 0.29 | 0.10 | 0.11 | 0.28 | 
| 9-10 | 9 | 10 | 1 | 1 | 0.5 | 0 | 0.37 | 0.00 | 0.00 | 0.13 | NaN | NaN | 
| 10-Inf | 10 | Inf | 0 | 0 | 0.0 | 0 | 0.37 | NA | NA | 0.13 | NA | NA | 
Following is a table of Kaplan-Meier estimates. Although it’s not clear from the table, the person censored (lost) at time 2 was at risk when the other person dies at time 2. On the following is a graph of the survival function.
mfit <- survfit(Surv(surv_mm/12, status == "Dead: cancer") ~ 1,
                data = colon_sample) # make Kaplan-Meier estimates
summary(mfit) # print Kaplan-Meier table## Call: survfit(formula = Surv(surv_mm/12, status == "Dead: cancer") ~ 
##     1, data = colon_sample)
## 
##   time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  0.167     35       1    0.971  0.0282        0.918        1.000
##  0.250     33       1    0.942  0.0398        0.867        1.000
##  0.417     32       1    0.913  0.0482        0.823        1.000
##  0.583     31       1    0.883  0.0549        0.782        0.998
##  0.667     30       1    0.854  0.0605        0.743        0.981
##  0.750     29       1    0.824  0.0652        0.706        0.962
##  0.917     28       1    0.795  0.0692        0.670        0.943
##  1.833     24       1    0.762  0.0738        0.630        0.921
##  2.250     22       1    0.727  0.0781        0.589        0.898
##  2.333     20       1    0.691  0.0823        0.547        0.872
##  2.667     19       2    0.618  0.0882        0.467        0.818
##  2.750     16       1    0.579  0.0908        0.426        0.788
##  3.583     13       1    0.535  0.0941        0.379        0.755
##  3.833     12       1    0.490  0.0962        0.334        0.720
##  8.500      4       1    0.368  0.1284        0.185        0.729plot(mfit,    # plot Kaplan-Meier curve
     ylab="S(t)",
     xlab="Time since diagnosis (years)",
     main = "Kaplan−Meier estimates of cause−specific survival")
ggsurvplot(mfit, # plot Kaplan-Meier curve
     ylab="S(t)",
     xlab="Time since diagnosis (years)",
     main = "Kaplan−Meier estimates of cause−specific survival",
     risk.table = TRUE,
     conf.int = TRUE,
     ggtheme = theme_minimal())