# Introduction

The purpose of this vignette is to illustrate the use of {`icpack`}, using the right-censored `Ova` data, and to contrast the smooth modeling using {`icpack`} with the more standard non- and semi-parametric approaches of the `survival` package.

# Ovarian cancer data

``````library(icpack)
``````

## Preliminaries

There are three covariates (see the help file). Here is a summary of their frequencies and distribution.

``````table(Ova\$Diameter)
``````
``````##
##   0   1   2   3   4
##  29  67  49  68 145
``````
``````table(Ova\$FIGO)
``````
``````##
##   0   1
## 262  96
``````
``````table(Ova\$Karnofsky)
``````
``````##
##   6   7   8   9  10
##  20  46  47 108 137
``````

`Diameter` is the diameter of the tumor, `FIGO` is the FIGO classification, where in fact `FIGO=0` corresponds to a FIGO III and `FIGO=1` to a FIGO IV classification, and `Karnofsky` is the often used Karnofsky score divided by 10, so for instance `Karnofsky=9` corresponds to a Karnofksy score of 90.

## Using icfit

The time-to-event data are right-censored. Both `Diameter` and `Karnofsky` are ordinal, so we could use them both as categorical and as continuous variables. Let’s first use them both as categorical, and see whether it makes sense to use them as continuous variables by inspecting their (log) hazard ratios.

``````ic_fit <- icfit(Surv(time, d) ~ factor(Diameter) + FIGO + factor(Karnofsky),
data = Ova)
ic_fit
``````
``````## Object of class 'icfit'
## Call:
## icfit(formula = Surv(time, d) ~ factor(Diameter) + FIGO + factor(Karnofsky),
##     data = Ova)
##
## Bins summary: tmin = 0, tmax = 2756, number of bins = 100, bin width = 27.56
## P-splines summary: number of segments = 20, degree = 3, penalty order = 2
##
## Parameter estimates:
##                        coef      SE      HR   lower upper   pvalue
## factor(Diameter)1    0.3446  0.3204  1.4114  0.7532 2.645  0.28217
## factor(Diameter)2    0.8383  0.3237  2.3123  1.2261 4.361  0.00961 **
## factor(Diameter)3    0.9130  0.3135  2.4919  1.3479 4.607  0.00359 **
## factor(Diameter)4    0.9389  0.3011  2.5573  1.4173 4.614  0.00182 **
## FIGO                 0.5397  0.1368  1.7156  1.3121 2.243 7.95e-05 ***
## factor(Karnofsky)7  -0.3239  0.2814  0.7233  0.4167 1.256  0.24973
## factor(Karnofsky)8  -0.7132  0.2909  0.4901  0.2771 0.867  0.01422 *
## factor(Karnofsky)9  -0.8391  0.2694  0.4321  0.2549 0.733  0.00184 **
## factor(Karnofsky)10 -0.8242  0.2646  0.4386  0.2611 0.737  0.00184 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Effective baseline dimension: 2.779, log-likelihood: -1251, AIC: 2507
## Smoothness parameter lambda: 236.6
## Number of iterations: 24
## n = 358, number of events = 266
``````

It looks like both for `Diameter` and `Karnofsky` the increments of the log hazard ratios are reasonably equal for neighboring factor levels, which could argue for using both as continuous covariates in the future. Before we do that, however, let’s compare the result with that from `coxph` in the `survival` package.

``````cox_fit <- coxph(Surv(time, d) ~ factor(Diameter) + FIGO + factor(Karnofsky), data = Ova)
cox_fit
``````
``````## Call:
## coxph(formula = Surv(time, d) ~ factor(Diameter) + FIGO + factor(Karnofsky),
##     data = Ova)
##
##                        coef exp(coef) se(coef)      z        p
## factor(Diameter)1    0.3452    1.4123   0.3205  1.077  0.28147
## factor(Diameter)2    0.8494    2.3383   0.3237  2.624  0.00870
## factor(Diameter)3    0.9204    2.5103   0.3136  2.935  0.00333
## factor(Diameter)4    0.9381    2.5551   0.3014  3.113  0.00185
## FIGO                 0.5471    1.7282   0.1367  4.002 6.28e-05
## factor(Karnofsky)7  -0.3662    0.6934   0.2821 -1.298  0.19424
## factor(Karnofsky)8  -0.7477    0.4735   0.2914 -2.566  0.01029
## factor(Karnofsky)9  -0.8729    0.4177   0.2700 -3.233  0.00122
## factor(Karnofsky)10 -0.8643    0.4213   0.2653 -3.258  0.00112
##
## Likelihood ratio test=70.54  on 9 df, p=1.193e-11
## n= 358, number of events= 266
``````

The results, in terms of the hazard ratios and their standard errors, are very similar indeed.

Let’s refit the smooth and semi-parametric model, now using `Diameter` and `Karnofsky` as continuous variables.

``````ic_fit <- icfit(Surv(time, d) ~ Diameter + FIGO + Karnofsky, data = Ova)
cox_fit <- coxph(Surv(time, d) ~ Diameter + FIGO + Karnofsky, data = Ova)
ic_fit
``````
``````## Object of class 'icfit'
## Call:
## icfit(formula = Surv(time, d) ~ Diameter + FIGO + Karnofsky,
##     data = Ova)
##
## Bins summary: tmin = 0, tmax = 2756, number of bins = 100, bin width = 27.56
## P-splines summary: number of segments = 20, degree = 3, penalty order = 2
##
## Parameter estimates:
##               coef       SE       HR    lower upper   pvalue
## Diameter   0.20057  0.04947  1.22209  1.10917 1.347 5.03e-05 ***
## FIGO       0.53863  0.13549  1.71366  1.31400 2.235 7.03e-05 ***
## Karnofsky -0.16843  0.05310  0.84499  0.76148 0.938  0.00151 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Effective baseline dimension: 2.739, log-likelihood: -1254, AIC: 2513
## Smoothness parameter lambda: 250.2
## Number of iterations: 25
## n = 358, number of events = 266
``````
``````cox_fit
``````
``````## Call:
## coxph(formula = Surv(time, d) ~ Diameter + FIGO + Karnofsky,
##     data = Ova)
##
##               coef exp(coef) se(coef)      z       p
## Diameter   0.20018   1.22162  0.04947  4.047 5.2e-05
## FIGO       0.54183   1.71915  0.13529  4.005 6.2e-05
## Karnofsky -0.17173   0.84221  0.05319 -3.228 0.00125
##
## Likelihood ratio test=64.15  on 3 df, p=7.64e-14
## n= 358, number of events= 266
``````

Without claiming in any way that using `Diameter` and `Karnofsky` as continuous variables is better, we continue with these last two models, mainly for simplicity.

## Plotting

The main difference between the smooth and semi-parametric fits of course resides in the estimates of the baseline hazard. We illustrate this by extracting the baseline hazards and plotting the corresponding baseline survival curves.

The `icpack` package uses `ggplot2` for plotting. The default plot is the (baseline) hazard; by setting `type="survival"` we get the (baseline) survival function. Default option is to add shaded pointwise confidence intervals; they have been switched off here.

``````plot(ic_fit, type="survival", conf.int = FALSE,
title = "Baseline survival function from icfit")
``````

Here is the corresponding baseline survival function from `coxph`, with the smooth baseline survival curve from `icfit` added to it in blue.

``````bh <- basehaz(cox_fit, centered=FALSE)
bh\$surv <- exp(-bh\$hazard)
plot(c(0, bh\$time), c(1, bh\$surv), type="s",
xlab="Time", ylab="Survival", main="Baseline survival functions from coxph and icfit")
nd_baseline <- data.frame(Diameter = 0, FIGO = 0, Karnofsky = 0)
pred_baseline <- predict(ic_fit, newdata = nd_baseline)[[1]]
lines(pred_baseline\$time, pred_baseline\$Surv, col = "blue")
legend("topright", c("coxph", "icfit"), lwd=1, col=c("black", "blue"), bty="n")
``````

The plot does not show pointwise confidence intervals; it is possible to obtain those using `predict`, which is illustrated in the next section. Obviously, the estimated baseline survival curve form `coxph` is less smooth; their values however are very similar.

The estimate of the baseline hazard from the Cox model fit is not smooth; plotting it does not produce anything remotely useful.

``````plot(bh\$time, diff(c(0, bh\$hazard)),
xlab="Time", ylab="Hazard", main="Baseline hazard function from coxph")
``````

The baseline hazard obtained from `icfit` is smooth and does give a useful illustration of the hazard, along with its pointwise confidence intervals.

``````plot(ic_fit, type = 'hazard', title = 'Baseline hazard function from icfit')
``````

The cumulative hazard can also be shown using `type="cumhazard"`.

## Predicting

Actually the baseline hazards and survival function are not really that interesting, in the sense that they correspond to a subject with `Diameter=0`, `FIGO=0` and `Karnofsky=0`. Such persons (at least `Karnofsky=0`) are not in the data and are even biologically impossible.

Modeled after the `survival` package, the function `predict` with a `newdata` object can be used to obtain hazards and survival functions for subjects with given characteristics. Let us illustrate this with subject A, who has `Diameter=0`, `FIGO=0` and `Karnofsky=10`, the most optimistic possibility, and subject B, who has `Diameter=4`, `FIGO=0` and `Karnofsky=6`, the most pessimistic possibility, apart from FIGO. This is simply entered into a data frame with the same names as the original data.

``````newd <- data.frame(subject = c("A", "B"),
Diameter = c(0, 4),
FIGO = c(0, 0),
Karnofsky = c(10, 6))
newd
``````
``````##   subject Diameter FIGO Karnofsky
## 1       A        0    0        10
## 2       B        4    0         6
``````
``````picf <- predict(ic_fit, newdata=newd)
``````

The result is an object of class `predict.icfit`, which is a list with 2 items (one for each subject), each containing a data frame with time points and hazards, cumulative hazards, survival functions at these time points with associated standard errors and confidence bounds. The `summary` method can be used to extract the hazards etc at specified time points.

``````summary(picf, times=(0:7) * 365.25)
``````
``````## [[1]]
##      time          haz        sehaz     lowerhaz     upperhaz       Haz
## 1    0.00 0.0002748966 5.985453e-05 1.794045e-04 0.0004212166 0.0000000
## 2  365.25 0.0003757599 6.227895e-05 2.715383e-04 0.0005199837 0.1190020
## 3  730.50 0.0004032806 6.606705e-05 2.925231e-04 0.0005559742 0.2653842
## 4 1095.75 0.0003192450 5.448410e-05 2.284828e-04 0.0004460616 0.3990633
## 5 1461.00 0.0002341901 4.440138e-05 1.615045e-04 0.0003395881 0.4990482
## 6 1826.25 0.0001838503 4.416871e-05 1.148073e-04 0.0002944148 0.5745981
## 7 2191.50 0.0001426012 5.385327e-05 6.802511e-05 0.0002989378 0.6342641
## 8 2556.75 0.0001053165 6.677041e-05 3.039765e-05 0.0003648876 0.6792628
##        seHaz   lowerHaz  upperHaz      Surv     seSurv lowerSurv upperSurv
## 1 0.00000000 0.00000000 0.0000000 1.0000000 0.00000000 1.0000000 1.0000000
## 2 0.02074635 0.07833988 0.1596641 0.8878064 0.01841863 0.8524307 0.9246503
## 3 0.04241678 0.18224886 0.3485196 0.7669117 0.03252979 0.7057328 0.8333942
## 4 0.06139357 0.27873415 0.5193925 0.6709484 0.04119187 0.5948821 0.7567412
## 5 0.07527367 0.35151452 0.6465819 0.6071083 0.04569926 0.5238333 0.7036216
## 6 0.08590000 0.40623716 0.7429590 0.5629311 0.04835577 0.4757043 0.6661522
## 7 0.09539228 0.44729862 0.8212295 0.5303257 0.05058896 0.4398905 0.6393530
## 8 0.10572956 0.47203664 0.8864889 0.5069906 0.05360389 0.4121002 0.6237307
##
## [[2]]
##      time          haz        sehaz     lowerhaz    upperhaz       Haz
## 1    0.00 0.0012027579 0.0002547998 0.0007940619 0.001821806 0.0000000
## 2  365.25 0.0016440659 0.0002813880 0.0011755278 0.002299352 0.5206705
## 3  730.50 0.0017644776 0.0003205709 0.0012358608 0.002519201 1.1611382
## 4 1095.75 0.0013967958 0.0002740174 0.0009509285 0.002051720 1.7460257
## 5 1461.00 0.0010246541 0.0002210092 0.0006713984 0.001563775 2.1834904
## 6 1826.25 0.0008044020 0.0002106302 0.0004814917 0.001343872 2.5140445
## 7 2191.50 0.0006239245 0.0002448002 0.0002891746 0.001346193 2.7751017
## 8 2556.75 0.0004607925 0.0002964758 0.0001305695 0.001626203 2.9719850
##        seHaz  lowerHaz  upperHaz       Surv     seSurv  lowerSurv upperSurv
## 1 0.00000000 0.0000000 0.0000000 1.00000000 0.00000000 1.00000000 1.0000000
## 2 0.09042499 0.3434407 0.6979002 0.59412674 0.05372243 0.49763584 0.7093282
## 3 0.19314401 0.7825828 1.5396935 0.31313329 0.06047861 0.21445125 0.4572260
## 4 0.29068118 1.1763011 2.3157503 0.17446686 0.05071392 0.09869302 0.3084182
## 5 0.36541104 1.4672979 2.8996829 0.11264771 0.04116270 0.05504072 0.2305477
## 6 0.42310149 1.6847808 3.3433081 0.08094038 0.03424594 0.03532005 0.1854852
## 7 0.47316123 1.8477227 3.7024806 0.06234323 0.02949836 0.02466235 0.1575957
## 8 0.52346939 1.9460038 3.9979661 0.05120160 0.02680245 0.01835295 0.1428438
``````

Finally, the hazards or cumulative hazards or survival functions can be plotted.

``````library(gridExtra)
plot(picf, type="surv", ylim=c(0, 1), xlab="Days since randomisation",
title = c("Subject A", "Subject B"), nrow=1)
``````

``````## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
``````

The results from `coxph` for these patients are shown in the plots below as black lines, in blue the results from `icpack`.

``````sfAB <- survfit(cox_fit, newdata=newd)
sfA <- sfAB[1]
sfB <- sfAB[2]
# Retrieve separate plots from icfit for comparison
picf1 <- picf[[1]]
class(picf1) <- c("predict.icfit", "data.frame")
plt1 <- plot(picf1, type='surv', ylim=c(0, 1),
xlab = "Days since randomisation",
title = "Subject A")
``````

``````picf2 <- picf[[2]]
class(picf2) <- c("predict.icfit", "data.frame")
plt2 <- plot(picf2, type='surv', ylim=c(0, 1),
xlab = "Days since randomisation",
title = "Subject B")
``````

``````oldpar <- par(mfrow=c(1, 2))
plot(sfA, xlab="Days since randomisation", ylab="Survival", main="Subject A")
lines(plt1\$data\$time, plt1\$data\$Surv, col = "blue")
lines(plt1\$data\$time, plt1\$data\$lowerSurv, col = "blue")
lines(plt1\$data\$time, plt1\$data\$upperSurv, col = "blue")
legend("topright", c("coxph", "icfit"), lwd=1, col=c("black", "blue"), bty="n")
plot(sfB, xlab="Days since randomisation", ylab="Survival", main="Subject B")
lines(plt2\$data\$time, plt2\$data\$Surv, col = "blue")
lines(plt2\$data\$time, plt2\$data\$lowerSurv, col = "blue")
lines(plt2\$data\$time, plt2\$data\$upperSurv, col = "blue")
legend("topright", c("coxph", "icfit"), lwd=1, col=c("black", "blue"), bty="n")
``````

``````par(oldpar)
``````

Again, the estimates and their standard errors are very similar.