Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data
We load the dependencies:
Then load the data and define an indicator:
localised <- subset(biostat3::melanoma, stage=="Localised") |>
    transform(death_cancer = ifelse(status == "Dead: cancer", 1, 0))Then we show the results using the actuarial estimator with years:
| tstart | tstop | nsubs | nlost | nrisk | nevent | surv | |
|---|---|---|---|---|---|---|---|
| 0-1 | 0 | 1 | 5318 | 81 | 5277.5 | 71 | 1.0000000 | 
| 1-2 | 1 | 2 | 5166 | 400 | 4966.0 | 228 | 0.9865467 | 
| 2-3 | 2 | 3 | 4538 | 381 | 4347.5 | 202 | 0.9412521 | 
| 3-4 | 3 | 4 | 3955 | 344 | 3783.0 | 138 | 0.8975183 | 
| 4-5 | 4 | 5 | 3473 | 312 | 3317.0 | 100 | 0.8647777 | 
| 5-6 | 5 | 6 | 3061 | 298 | 2912.0 | 80 | 0.8387066 | 
| 6-7 | 6 | 7 | 2683 | 267 | 2549.5 | 56 | 0.8156652 | 
| 7-8 | 7 | 8 | 2360 | 293 | 2213.5 | 35 | 0.7977491 | 
| 8-9 | 8 | 9 | 2032 | 275 | 1894.5 | 34 | 0.7851350 | 
| 9-10 | 9 | 10 | 1723 | 243 | 1601.5 | 16 | 0.7710445 | 
| 10-11 | 10 | 11 | 1464 | 197 | 1365.5 | 18 | 0.7633412 | 
| 11-12 | 11 | 12 | 1249 | 189 | 1154.5 | 17 | 0.7532789 | 
| 12-13 | 12 | 13 | 1043 | 161 | 962.5 | 2 | 0.7421869 | 
| 13-14 | 13 | 14 | 880 | 186 | 787.0 | 4 | 0.7406447 | 
| 14-15 | 14 | 15 | 690 | 153 | 613.5 | 3 | 0.7368803 | 
| 15-16 | 15 | 16 | 534 | 110 | 479.0 | 2 | 0.7332769 | 
| 16-17 | 16 | 17 | 422 | 111 | 366.5 | 5 | 0.7302152 | 
| 17-18 | 17 | 18 | 306 | 97 | 257.5 | 1 | 0.7202532 | 
| 18-19 | 18 | 19 | 208 | 81 | 167.5 | 1 | 0.7174561 | 
| 19-20 | 19 | 20 | 126 | 65 | 93.5 | 0 | 0.7131728 | 
| 20-Inf | 20 | Inf | 61 | 61 | 30.5 | 0 | 0.7131728 | 
Similarly, we use the actuarial estimator using months:
| tstart | tstop | nsubs | nlost | nrisk | nevent | surv | |
|---|---|---|---|---|---|---|---|
| 109-110 | 109 | 110 | 1699 | 27 | 1685.5 | 1 | 0.7701209 | 
| 110-111 | 110 | 111 | 1671 | 16 | 1663.0 | 1 | 0.7696640 | 
| 111-112 | 111 | 112 | 1654 | 26 | 1641.0 | 1 | 0.7692012 | 
| 112-113 | 112 | 113 | 1627 | 27 | 1613.5 | 1 | 0.7687325 | 
| 113-114 | 113 | 114 | 1599 | 19 | 1589.5 | 0 | 0.7682560 | 
| 114-115 | 114 | 115 | 1580 | 21 | 1569.5 | 0 | 0.7682560 | 
| 115-116 | 115 | 116 | 1559 | 26 | 1546.0 | 1 | 0.7682560 | 
| 116-117 | 116 | 117 | 1532 | 20 | 1522.0 | 2 | 0.7677591 | 
| 117-118 | 117 | 118 | 1510 | 14 | 1503.0 | 1 | 0.7667502 | 
| 118-119 | 118 | 119 | 1495 | 14 | 1488.0 | 4 | 0.7662401 | 
| 119-120 | 119 | 120 | 1477 | 12 | 1471.0 | 1 | 0.7641803 | 
| 120-121 | 120 | 121 | 1464 | 11 | 1458.5 | 1 | 0.7636608 | 
| 121-122 | 121 | 122 | 1452 | 9 | 1447.5 | 4 | 0.7631372 | 
| 122-123 | 122 | 123 | 1439 | 13 | 1432.5 | 2 | 0.7610284 | 
| 123-124 | 123 | 124 | 1424 | 15 | 1416.5 | 4 | 0.7599658 | 
| 124-125 | 124 | 125 | 1405 | 25 | 1392.5 | 0 | 0.7578198 | 
| 125-126 | 125 | 126 | 1380 | 15 | 1372.5 | 0 | 0.7578198 | 
| 126-127 | 126 | 127 | 1365 | 16 | 1357.0 | 2 | 0.7578198 | 
| 127-128 | 127 | 128 | 1347 | 25 | 1334.5 | 2 | 0.7567029 | 
| 128-129 | 128 | 129 | 1320 | 15 | 1312.5 | 0 | 0.7555688 | 
| 129-130 | 129 | 130 | 1305 | 16 | 1297.0 | 1 | 0.7555688 | 
Then the code using the Kaplan-Meier estimator with years:
## Call: survfit(formula = Surv(surv_yy, death_cancer) ~ 1, data = localised)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.5   5318      71    0.987 0.00157        0.984        0.990
##   1.5   5166     228    0.943 0.00320        0.937        0.949
##   2.5   4538     202    0.901 0.00420        0.893        0.909
##   3.5   3955     138    0.870 0.00483        0.860        0.879
##   4.5   3473     100    0.845 0.00530        0.834        0.855
##   5.5   3061      80    0.823 0.00571        0.811        0.834
##   6.5   2683      56    0.805 0.00603        0.794        0.817
##   7.5   2360      35    0.793 0.00627        0.781        0.806
##   8.5   2032      34    0.780 0.00657        0.767        0.793
##   9.5   1723      16    0.773 0.00675        0.760        0.786
##  10.5   1464      18    0.763 0.00703        0.750        0.777
##  11.5   1249      17    0.753 0.00737        0.739        0.768
##  12.5   1043       2    0.752 0.00743        0.737        0.766
##  13.5    880       4    0.748 0.00759        0.733        0.763
##  14.5    690       3    0.745 0.00779        0.730        0.760
##  15.5    534       2    0.742 0.00800        0.727        0.758
##  16.5    422       5    0.733 0.00882        0.716        0.751
##  17.5    306       1    0.731 0.00911        0.713        0.749
##  18.5    208       1    0.727 0.00972        0.709        0.747And the Kaplan-Meier estimator with data in months:
mfit_months <- survfit(Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
summary(mfit_months,times=(110:130)/12)## Call: survfit(formula = Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
## 
##   time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   9.17   1671     948    0.770 0.00684        0.757        0.784
##   9.25   1654       1    0.770 0.00685        0.757        0.784
##   9.33   1627       1    0.770 0.00686        0.756        0.783
##   9.42   1599       1    0.769 0.00687        0.756        0.783
##   9.50   1580       0    0.769 0.00687        0.756        0.783
##   9.58   1559       0    0.769 0.00687        0.756        0.783
##   9.67   1532       1    0.769 0.00688        0.755        0.782
##   9.75   1510       2    0.768 0.00691        0.754        0.781
##   9.83   1495       1    0.767 0.00693        0.754        0.781
##   9.92   1477       4    0.765 0.00698        0.751        0.779
##  10.00   1464       1    0.764 0.00700        0.751        0.778
##  10.08   1452       1    0.764 0.00701        0.750        0.778
##  10.17   1439       4    0.762 0.00707        0.748        0.776
##  10.25   1424       2    0.761 0.00710        0.747        0.775
##  10.33   1405       4    0.759 0.00716        0.745        0.773
##  10.42   1380       0    0.759 0.00716        0.745        0.773
##  10.50   1365       0    0.759 0.00716        0.745        0.773
##  10.58   1347       2    0.758 0.00719        0.744        0.772
##  10.67   1320       2    0.756 0.00723        0.742        0.771
##  10.75   1305       0    0.756 0.00723        0.742        0.771
##  10.83   1288       1    0.756 0.00724        0.742        0.770(a)
The actuarial method is most appropriate because it deals with ties (events and censorings at the same time) in a more appropriate manner. The fact that there are a reasonably large number of ties in these data means that there is a difference between the estimates.
(b)
The K-M estimate changes more. Because the actuarial method deals with ties in an appropriate manner it is not biased when data are heavily tied so is not heavily affected when we reduce the number of ties.
(c)
The plot clearly shows that the Kaplan-Meier estimator with the aggregated data is upwardly biased compared with the other curves.
plot(mfit_months, conf.int=FALSE,
     ylim=c(0.7,1), cex=5,
     xlab="Time from cancer diagnosis (years)",
     ylab="Survival")
survfit(Surv(surv_yy, death_cancer) ~ 1, data = localised) |>
    lines(col="red", conf.int=FALSE)
lifetab2(Surv(surv_mm/12,death_cancer)~1, data = localised) |>
    lines(col="orange")
lifetab2(Surv(floor(surv_yy),death_cancer)~1, data = localised) |>
    lines(col="green")
legend("topright",
       legend=c("KM (months)", "KM (years)", "Actuarial (months)", "Actuarial (years)"),
       lty=1,
       col=c("black", "red", "orange", "green"),
       bty="n")