\documentclass{article}[11pt]
\usepackage{Sweave}
\usepackage{amsmath}
\addtolength{\textwidth}{1in}
\addtolength{\oddsidemargin}{-.5in}
\setlength{\evensidemargin}{\oddsidemargin}
\SweaveOpts{keep.source=TRUE, fig=FALSE}
% Ross Ihaka suggestions
\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em}
\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em}
\fvset{listparameters={\setlength{\topsep}{0pt}}}
\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}}
\SweaveOpts{prefix.string=adjcurve,width=6,height=4}
\setkeys{Gin}{width=\textwidth}
%\VignetteIndexEntry{Validation}
<>=
options(continue=" ", width=60)
options(SweaveHooks=list(fig=function() par(mar=c(4.1, 4.1, .3, 1.1))))
pdf.options(pointsize=8) #text in graph about the same as regular text
library(survival, quietly=TRUE)
@
\newcommand{\imat}{H} % use H for the hessian rather than script I
\newcommand{\Cvar}{H^{-1}} % use H for the hessian rather than script I
\newcommand{\splus}{R} % ``the survival package'' would be an alternate
\newcommand{\xbar}{\overline x}
\newcommand{\lhat}{\hat \Lambda}
\def\bhat{\hat\beta} %define "bhat" to mean "beta hat"
\def\Mhat{\widehat M} %define "Mhat" to mean M-hat
\newcommand{\code}[1]{\texttt{#1}}
\title{Software validation}
\author{Terry M Therneau}
\date{Dec 2022 (updated)}
\begin{document}
\maketitle
\section{Introduction}
\begin{quotation}
`When I use a word,' Humpty Dumpty said, in rather a scornful tone,
`it means just what I choose it to mean - neither more nor less.'
`The question is,' said Alice, `whether you can make words mean so
many different things.'
`The question is,' said Humpty Dumpty, `which is to be master - that's all.'
-Lewis Caroll, \emph{Through the Looking Glass}
\end{quotation}
``Validatation'' has become a Humpty-Dumpty word: it is used for many different
things in
scientific research that it has become essentially meaningless
without further clarification.
One of the more common meanings assigned to it in the software
realm is ``repeatability'', i.e.
that a new release of a given package or routine
will give the same results as it did the week before.
Users of the software often assume the word implies a more
rigorous criterion, namely that the routine gives \emph{correct} answers.
Validation of the latter type is rare, however;
working out formally correct answers is boring, tedious work.
This note contains a set of examples of this latter type.
Although the data sets are very simple, the examples
have proven extremely useful in debugging the methods, not least
because all the intermediate steps of each calculation are transparent,
and have been incorporated into the formal test suite for the
survival package as the files \code{book1.R}, \code{book2.R}, etc. in the
\code{tests} subdirectory.
They also continue to be a resource for package's defence:
I have been told multiple times that some person or group cannot use R
in their work because ``SAS is validated'' while R is not.
The survival package passes all of the tests below
and SAS passes some but not all of them.
It is my hope that the formal test cases will
be a resource for developers on multiple platforms.
Portions of this work were included as an appendix in
the textbook of Therneau and Grambsch
\cite{Therneau00} precisely for this reason.
\section{Basic formulas}
All these examples have a single covariate.
Let $x_i$ be the covariate for each subject, $r_i= exp(x_i \beta)$
the risk score for the subject, and $w_i$ the case weight, if any.
Let $Y_i(t)$ be 1 if subject $i$ is at risk at time $t$ and 0 otherwise,
and $dN_i(t)$ be the death indicator which is 1 if subject $i$ has
an event at time $t$.
At each death time we have the following quantities:
\begin{align}
LPL(t) &= \frac{\sum_i dN_i(t) \log(r_i)}{\log\left( \sum_i Y_i(t) w_i r_i
\right)} \\
\xbar(t) &= \frac{\sum_i Y_i(t) w_i r_i x_i}{ \sum_i Y_i(t) w_i r_i}\\
U(t) &= \sum_i dN_i(t) (x_i - \xbar(t)) \label{U1} \\
H(t) &= \frac{\sum_i Y_i(t) (x_i - \xbar(t))^2}{\sum_i Y_i(t) w_i r_i}\\
&= \frac{\sum_i Y_i(t) x_i^2}{\sum_i Y_i(t) w_i r_i}- \xbar(t)^2
\label{H2} \\
\lambda(t) &= \frac{\sum_i w_i dN_i(t)}{\sum_i Y_i(t) w_i r_i}
\end{align}
The denominator for all the sums is the weighted number of subjects who
are at risk,
$LPL$ is the contribution to the log partial likelihood at time $t$, and
$\xbar$ and $H$ are the weighted mean and variance of the covariate $x$
at each time.
The sum of $H(t)$ over the death times is the second derivative of the
LPL, also known as the Hessian matrix.
$U$ is the contribution to the first derivative of the LPL at time $t$
and $\lambda$ is the increment in the baseline hazard function.
\section{Test data 1}
This data set of $n=6$ subjects has a single 0/1 covariate $x$.
There is one tied death time, one time with both a death and a censored
observation,
one with only a death, and one with only censoring.
(This is as small as a data set can be and still cover these four
important cases.)
Let $r = \exp(\beta)$ be the risk score for a subject with $x=1$;
the risk score is $\exp(0) =1$ for those with $x=0$.
Table \ref{tab:val1} shows the data set along with the mean and
increment to the hazard at each time point.
\begin{table}[b] \centering
\begin{tabular}{ccc|cc|cc}
&&& \multicolumn{2}{c|}{$\xbar(t)$} & \multicolumn{2}{c}{$d\lhat_0(t)$} \\
Time& Status& $x$& Breslow& Efron& Breslow& Efron \\ \hline
1&1&1&$r/(r+1)$ & $r/(r+1)$& $1/(3r+3)$ & $1/(3r+3)$ \\
1&0&1&&&&\\
6&1&1& $r/(r+3)$ & $r/(r+3)$ & $1/(r+3)$& $1/(r+3)$ \\
6&1&0& $r/(r+3)$ & $r/(r+5)$ & $1/(r+3)$& $1/(r+5)$ \\
8&0&0&&&& \\
9&1&0&0&0& 1& 1\\
\end{tabular}
\caption{Test data 1}
\label{tab:val1}
\end{table}
\subsection{Breslow estimates}
\label{sect:valbreslow}
The log partial likelihood (LPL) has a term for each event; each term is
the log of the ratio of the score for the subject who had an event over
the sum of scores for those who did not.
The LPL, first derivative $U$ of the LPL and
second derivative (or Hessian) $\imat$ are:
\begin{eqnarray*}
LPL &=& \{\beta- \log(3r+3)\} + \{\beta - \log(r+3)\} +
\{0-\log(r+3)\} + \{0-0\} \\
&=& 2\beta - \log(3r+3) - 2\log(r+3). \\ \\
U &=& \left(1-\frac{r}{r+1}\right) + \left(1-\frac{r}{r+3}\right) +
\left(0-\frac{r}{r+3} \right) + (0-0) \\
&=& \frac{-r^2 + 3r + 6}{(r+1)(r+3)} .\\ \\
-\imat&=& \left\{\frac{r}{r+1} - \left( \frac{r}{r+1} \right )^2\right\}
+2 \left\{\frac{r}{r+3} - \left( \frac{r}{r+3} \right )^2\right\} +
(0-0) \\
&=& \frac{r}{(r+1)^2} + \frac{6r}{(r+3)^2}.
\end{eqnarray*}
(For a 0/1 covariate the variance formula \eqref{H2} simplifies to
$\xbar - \xbar^2$, but only in that case. We used this fact above.)
The actual solution corresponds to $U(\beta)=0$, which from the quadratic
formula is $r=(1/2)(3 + \sqrt{33})$;
or $\bhat = \log(r) \approx 1.475285$.
The following function computes these quantities.
<>=
breslow1 <- function(beta) {
# first test data set, Breslow approximation
r = exp(beta)
lpl = 2*beta - (log(3*r +3) + 2*log(r+3))
U = (6+ 3*r - r^2)/((r+1)*(r+3))
H = r/(r+1)^2 + 6*r/(r+3)^2
c(beta=beta, loglik=lpl, U=U, H=H)
}
beta <- log((3 + sqrt(33))/2)
temp <- rbind(breslow1(0), breslow1(beta))
dimnames(temp)[[1]] <- c("beta=0", "beta=solution")
temp
@
The maximum partial likelihood occurs when $U(\beta)=0$, namely
$r^2 -3r -6 =0$.
Using the usual formula for a quadratic equation gives
$r=(1/2)(3 + \sqrt{33})$ and $\bhat = \log(r) \approx 1.475285$.
The above call to \code{breslow1} verifies that the first derivative
is zero at this point.
The Newton--Raphson iteration has increments of $-\Cvar U$.
Starting with the usual initial estimate of $\beta=0$, the first
iteration is $\beta=8/5$ and further ones are shown below.
<<>>=
iter <- matrix(0, nrow=6, ncol=4,
dimnames=list(paste("iter", 0:5),
c("beta", "loglik", "U", "H")))
# Exact Newton-Raphson
beta <- 0
for (i in 1:5) {
iter[i,] <- breslow1(beta)
beta <- beta + iter[i,"U"]/iter[i,"H"]
}
print(iter, digits=9)
# coxph fits
test1 <- data.frame(time= c(1, 1, 6, 6, 8, 9),
status=c(1, 0, 1, 1, 0, 1),
x= c(1, 1, 1, 0, 0, 0))
temp <- matrix(0, nrow=6, ncol=4,
dimnames=list(1:6, c("iter", "beta", "loglik", "H")))
for (i in 0:5) {
tfit <- coxph(Surv(time, status) ~ x, data=test1,
ties="breslow", iter.max=i)
temp[i+1,] <- c(tfit$iter, coef(tfit), tfit$loglik[2], 1/vcov(tfit))
}
temp
@
The \code{coxph} routine declares convergence after 4 iterations for this
data set, so the last two calls with \code{iter.max} of 4 and 5 give
identical results.
The martingale residuals are defined as $O-E$ = observed - expected, where
the observed is the number of events for the subject (0 or 1) and
$E$ is the expected number assuming that the model is
completely correct.
For the first death all 6 subjects are at risk,
and the martingale formulation views the outcome as a lottery in which
the subjects hold $r$, $r$, $r$, 1, 1 and 1 tickets, respectively.
The contribution to $E$ for subject 1 at time 1 is thus $r/(r+3)$.
Carrying this forward the residuals can be written as simple function of the
cumulative baseline hazard $\Lambda_0(t)$, the Nelson
cumulative hazard estimator with case weights of $w_ir_i$; this is
shown in the `Breslow' column of table \ref{tab:val1}.
(Also known as the Aalen estimate, Breslow estimate, and all possible
combinations of the three names.)
Then the residual can be written as
\begin{equation}
M_i = \delta_i - \exp(x_i\beta)\lhat(t_i) \label{mart}
\end{equation}
Each of the two subjects who die at time 6 are credited with the full
hazard increment at time 6.
Residuals at $\beta=0$ and $\bhat$ are shown in the table below.
<>=
mresid1 <- function(r) {
status <- c(1,0,1,1,0,1)
xbeta <- c(r,r,r,1,1,1)
temp1 <- 1/(3*r +3)
temp2 <- 2/(r+3) + temp1
status - xbeta*c(temp1, temp1, temp2, temp2, temp2, 1+ temp2)
}
r0 <- mresid1(1)
r1 <- round(mresid1((3 + sqrt(33))/2), 6)
@
\begin{center}
\begin{tabular}{c|lrr}
Subject &\multicolumn{1}{c}{$\Lambda_0$} & $\Mhat(0)$ & $\Mhat(\bhat)$ \\
\hline
1& $1/(3r+3)$ & $5/6$ & \Sexpr{r1[1]} \\
2& $1/(3r+3)$ & $-1/6$ & \Sexpr{r1[2]} \\
3& $1/(3r+3) + 2/(r+3)$ &$1/3$ & \Sexpr{r1[3]} \\
4& $1/(3r+3) + 2/(r+3)$ &$1/3$ & \Sexpr{r1[4]} \\
5& $1/(3r+3) + 2/(r+3)$ & $-2/3$ & \Sexpr{r1[5]} \\
6 & $1/(3r+3) + 2/(r+3) +1$ & $-2/3$ & \Sexpr{r1[6]}
\end{tabular}
\end{center}
The score statistic $U$ can be written as a two way sum
involving the covariate(s) and the martingale residuals
\begin{equation}
U = \sum_{i=1}^n \int [x_i - \xbar(t)] dM_i(t) \label{score}
\end{equation}
The martingale residual $M$ has jumps at the observed
deaths, leading to the table below with 6 rows and 3 columns.
The score residuals $L_i$ are defined as the per-patient contributions to
this total, i.e., the row sums, and the Schoenfeld residuals are the
per-time point contributions, i.e., the column sums.
\begin{center}
\begin{tabular}{cccc}
&\multicolumn{3}{c}{Time} \\
Subject & 1 & 6 & 9 \\ \hline
1&$ \left(1- \frac{r}{r+1} \right) \left(1- \frac{r}{3r+3} \right)$ &0&0 \\
2& $\left(1- \frac{r}{r+1} \right) \left(0- \frac{r}{3r+3} \right)$ &0&0 \\
3& $\left(1- \frac{r}{r+1} \right) \left(0- \frac{r}{3r+3} \right)$ &
$\left(1- \frac{r}{r+3} \right) \left(1- \frac{2r}{r+3} \right)$ & 0\\
4& $\left(0- \frac{r}{r+1} \right) \left(0- \frac{1}{3r+3} \right)$ &
$\left(0- \frac{r}{r+3} \right) \left(1- \frac{2}{r+3} \right)$ & 0\\
5& $\left(0- \frac{r}{r+1} \right) \left(0- \frac{1}{3r+3} \right)$ &
$\left(0- \frac{r}{r+3} \right) \left(0- \frac{2}{r+3} \right)$ & 0\\
6& $\left(0- \frac{r}{r+1} \right) \left(0- \frac{1}{3r+3} \right)$ &
$\left(0- \frac{r}{r+3} \right) \left(0- \frac{2}{r+3} \right)$ &
(0 - 0) (1-1)
\end{tabular}
\end{center}
At $\beta=0$ the score residuals are 5/12, -1/12, 7/24, -1/24, 5/24 and 5/24.
Showing that the three column sums are identical to the three terms of
equation \eqref{U1} is left as an exercise for the reader, namely
$1- \xbar(1)$, $(1- \xbar(6)) + (0- \xbar(6))$ and $1 - \xbar(9)$.
The computer program returns 4 residuals, one per event, rather than one
per death time as this has proven to be more useful for plots and other
downstream computations.
In the multivariate case there will be a matrix like the above for each
covariate.
Let $L$ be the $n$ by $p$ matrix made up of the collection of row sums where
$n$ is the number of subjects and $p$ is the number of covariates,
this is the matrix of score residuals.
The dfbeta residuals are the $n$ by $p$ matrix $D = L \Cvar$; $\imat$ has
been defined above for this data set.
$D$ is an approximate measure of the influence of each observation on the
solution vector.
Similarly, the scaled Schoenfeld residuals are the (number of events) by $p$
matrix obtained by multiplying the Schoenfeld residuals by $\Cvar$.
As stated above there is a close connection between the
Nelson-Aalen estimate estimate of cumulative hazard and the
Breslow approximation for ties.
The baseline hazard is shown as the column $\Lambda_0$ in table \ref{tab:val1}.
The estimated hazard for a subject with covariate $x_i$ is
$\Lambda_i(t) = \exp(x_i \beta) \Lambda_0(t)$ and the survival
estimate for the subject is $S_i(t)= \exp(-\Lambda_i(t))$.
The variance of the cumulative hazard is the sum of two terms.
Term 1 is a natural extension of
the Nelson--Aalen estimator to the case where there are
weights. It is a running sum, with an increment at each death of
$1/(\sum Y_i(t)r_i(t))^2$.
For a subject with covariate $x_i$ this term is multiplied by
$[\exp(x_i \beta)]^2$.
The second term is $c\Cvar c'$, where $\imat$ is the
information matrix of the
Cox model and $c$ is a vector.
The second term accounts for the fact that the
weights themselves have a variance;
$c$ is the derivative of $S(t)$ with respect to $\beta$
and can be formally written as
$$ \exp(x\beta) \int_0^t (\bar x(s) - x_i) d\hat\Lambda_0 (s)\,.
$$
This can be recognized as $-1$ times the score residual process
for a subject with $x_i$ as covariates and no events; it measures
leverage of a particular observation on the estimate of $\beta$. It is
intuitive that a small score residual --- an observation whose
covariates has little influence on $\beta$ --- results in a small added
variance; that is, $\beta$ has little influence on the estimated survival.
\begin{center}
\begin{tabular}{c|l}
Time & Term 1 \\ \hline
1& $1/(3r+3)^2$ \\
6& $1/(3r+3)^2 + 2/(r+3)^2$ \\
9& $1/(3r+3)^2 + 2/(r+3)^2 + 1/1^2$ \\ \multicolumn{2}{c}{} \\
Time & $c$ \\ \hline
1& $(r/(r+1))* 1/(3r+3)$ \\
6& $(r/(r+1))* 1/(3r+3) + (r/(r+3))* 2/(r+3)$ \\
9& $(r/(r+1))* 1/(3r+3) + (r/(r+3))* 2/(r+3) + 0*1$\\
\end{tabular}
\end{center}
For $\beta=0$, $x=0$:
\begin{center}
\begin{tabular}{c|lll}
Time & \multicolumn{2}{c}{Variance}&\\ \hline
1 & 1/36 &+ $1.6*(1/12)^2 $ &= 7/180\\
6 & (1/36 + 2/16) &+ $1.6*(1/12 + 2/16)^2 $ &= 2/9\\
9 & (1/36 + 2/16 + 1)&+ $1.6*(1/12 + 2/16 + 0)^2$&= 11/9\\
\end{tabular}
\end{center}
For $\beta=1.4752849$, $ x=0$
\begin{center}
\begin{tabular}{c|lll}
Time & \multicolumn{2}{c}{Variance}&\\ \hline
1 & 0.0038498 &+ .004021 &= 0.007871\\
2 & 0.040648 &+ .0704631 &= 0.111111\\
4 & 1.040648 &+ .0704631 &= 1.111111\\
\end{tabular}
\end{center}
\subsection{Efron approximation}
The Efron approximation \cite{Efron77}
differs from the Breslow only at day 6, where two
deaths occur.
A useful way to view the approximation is to recast the problem as
a lottery model. On day 1 there were 6 subjects in the lottery and
1 ticket was drawn, at which time the winner became ineligible for further
drawings and withdrew.
On day 6 there were 4 subjects in the drawing (at risk)
and two tickets (deaths) were drawn.
The Breslow approximation considers all four subjects to be eligible for
both drawings, which implies that one of them could in theory have won
both, that is, died twice. This is of clearly impossible.
The Efron approximation treats the two drawings on day 6 as sequential.
All four living subjects are at risk for the first of them, then the
winner is withdrawn.
Three subjects are eligible for the second drawing,
either subjects 3, 5, and 6 or subjects 2, 5, and 6, but we
do not know which.
In some sense then, subjects 3 and 4 each have ``.5 probability" of
being at risk for the second event at time 6.
In the computation, we treat the two deaths at time 6 as two separate times
(two terms
in the loglik), with subjects 3 and 4 each having a case weight of 1/2 for the
second one.
The mean covariate for the second event is then
$$
\frac{1*r/2 + 0*1/2 + 0*1 + 0*1 }
{r/2 + 1/2 + 1+1} = \frac{r}{r+5}
$$
and the main quantities are
\begin{eqnarray*}
LL &=& \{\beta- \log(3r+3)\} + \{\beta - \log(r+3)\} + \{0-\log(r/2 +5/2)\}
+ \{0-0\} \\
&=& 2\beta - \log(3r+3) - \log(r+3) - \log(r/2 +5/2)\\ \\
U &=& \left(1-\frac{r}{r+1}\right) + \left(1-\frac{r}{r+3}\right)
+ \left(0-\frac{r}{r+5}\right) + (0-0) \\
&=& \frac{-r^3 + 23r + 30}{(r+1)(r+3)(r+5)} \\ \\
I&=& \left\{\frac{r}{r+1} - \left( \frac{r}{r+1} \right )^2\right\}
+ \left\{\frac{r}{r+3} - \left( \frac{r}{r+3} \right )^2\right\} \\
&& + \left\{\frac{r}{r+5} - \left( \frac{r}{r+5} \right )^2\right\}.
\end{eqnarray*}
The solution corresponds to the one positive root of $U(\beta)=0$, which is
$r=2\sqrt{23/3}\cos(\phi/3)$
where $\phi=\arccos\{(45/23)\sqrt{3/23}\}$ via the standard formula
for the roots of a cubic equation.
This yields $r \approx 5.348721$ or $\bhat = \log(r) \approx 1.676857$.
Plugging this value into the formulas above yields
$$
\begin{array}{ll}
LL(0)=-4.276666 & LL(\bhat)=-3.358979 \\
U(0) = 52/48 & U(\bhat) = 0 \\
\imat(0) = 83/144 & \imat(\bhat) = 0.612632.
\end{array}
$$
The martingale residuals are again $O-E$, but the expected part of the
calculation changes.
For the first drawing at time 6 the total number of ``tickets'' in the
drawing is $r+1+1+1$; subject 4 has an increment of $r/(r+3)$ and the others
$1/(r+3)$ to their expected value.
For the second event at time 6 subjects 3 and 4 have a weight of 1/2,
the total number of tickets is $(r+5)/2$
and the consequent increment in the cumulative hazard is $2/(r+5)$.
This $\beta=0$ this calculation is equivalent to the Fleming-Harrington
\cite{Fleming84} estimate of cumulative hazard.
Subjects 3 and 4 receive 1/2 of this second increment to $E$ and subjects
5 and 6 the full increment.
Efron \cite{Efron77} did not discuss residuals so did not investigate this
aspect of the approximation, we nevertheless sometime refer to
this using combinations of
Fleming, Harrington, Efron in the same way as the Nelson-Aalen-Breslow
estimate. The martingale residuals are
\begin{center}
\begin{tabular}{cl}
Subject & $M_i$ \\ \hline
1 & $1 - r/(3r+3)$ \\
2 & $0 - r/(3r+3)$ \\
3 & $1 - r/(3r+3) - r/(r+3) - r/(r+5)$ \\
4 & $1 - 1/(3r+3) - 1/(r+3) - 1/(r+5)$ \\
5 & $0 - 1/(3r+3) - 1/(r+3) - 2/(r+5)$ \\
6 & $0 - 1/(3r+3) - 1/(r+3) - 2/(r+5) - 1$ \\
\end{tabular}
\end{center}
giving residuals at $\beta=0$ of 5/6, -1/6, 5/12, 5/12, -3/4 and -3/4.
The matrix defining the score and Schoenfeld residuals
has the same first column (time 1) and last column as before, with the
following contributions at time 6.
\begin{center}
\begin{tabular}{ccc}
&\multicolumn{2}{c}{Time} \\
Subject & 6 (first) & 6 (second)\\ \hline
1&0 & 0 \\
2&0 & 0 \\
3& $\left(1- \frac{r}{r+3} \right) \left(1- \frac{r}{r+3} \right)$ &
$\left(1- \frac{r}{r+5} \right) \left(1- \frac{2r}{r+5} \right)/2$ \\
4& $\left(0- \frac{r}{r+3} \right) \left(0- \frac{1}{r+3} \right)$ &
$\left(0- \frac{r}{r+5} \right) \left(1- \frac{2}{r+5} \right)/2$ \\
5& $\left(0- \frac{r}{r+3} \right) \left(0- \frac{1}{r+3} \right)$ &
$\left(0- \frac{r}{r+5} \right) \left(0- \frac{2}{r+5} \right)$ \\
6& $\left(0- \frac{r}{r+3} \right) \left(0- \frac{1}{r+3} \right)$ &
$\left(0- \frac{r}{r+5} \right) \left(0- \frac{2}{r+5} \right)$
\end{tabular}
\end{center}
The score residuals at $\beta=0$ are 5/12, -1/12, 55/144, -5/144,
29/144 and 29/144.
It is an error to generate residuals for the Efron method by using
formula \eqref{mart}, which was derived from the Breslow approximation.
It is clear that some packages
do exactly this, however, which can be verified using formulas from above.
(Statistical forensics is another use for our results.)
What are the consequences of this?
On a formal level the resulting ``martingale residuals'' no longer have
an expected value of 0 and thus are not martingales,
so one loses theoretical backing for derived plots or statistics.
The score, Schoenfeld, dfbeta and scaled Schoenfeld residuals are based on
the martingale residual so suffer the same loss.
On a practical level, when the fraction of ties is small it is quite often the
case that $\bhat$ is nearly the same when using the Breslow and Efron
approach. We have normally found the correct and ad hoc residuals to
be similar as well in that case, sufficiently so that
explorations of functional form (martingale residuals), leverage and robust
variance (dfbeta) and proportional hazards (scaled Schoenfeld) led to
the same conclusions.
This may not hold when there is a large number of ties.
The variance formula for the baseline hazard function in the Efron case
is evaluated the same way as before, as
the sum of $(\mbox{hazard increment})^2$,
treating a tied death as multiple separate
hazard increments.
In term 1 of the variance, the variance increment at time 6 is
now $1/(r+3)^2 + 4/(r+5)^2$ rather than $2/(r+3)^2$.
The increment to $d$ at time 6 is
$(r/(r+3))* 1/(r+3) + (r/(r+5))* 2/(r+5)$.
(Numerically, the result of this
computation is intermediate between the
Nelson--Aalen variance and the
Greenwood variance used in the Kaplan--Meier.)
For $\beta=0$, $x=0$, let $v= \Cvar = 144/83$.
\begin{center}
\begin{tabular}{c|ll}
Time & \multicolumn{2}{c}{Variance}\\ \hline
1 & 1/36 \\ &
\quad + $v(1/12)^2 $ &= \phantom{0}119/2988\\
6 & (1/36 + 1/16 + 4/25)& \\
&\quad + $v(1/12 + 1/16+ 1/18)^2$ &= 1996/6225\\
9 & (1/36 + 1/16 + 4/25 + 1) \\&
\quad + $v(1/12 + 1/16 + 1/18 +0)^2$&= 8221/6225\\
\end{tabular}
\end{center}
For $\beta=1.676857$, $ x=0$.
\begin{center}
\begin{tabular}{c|ll}
Time & \multicolumn{2}{c}{Variance}\\ \hline
1 & 0.00275667 + .00319386 & = 0.0059505\\
2 & 0.05445330 + .0796212 & = 0.134075\\
4 & 1.05445330 + .0796212 &= 1.134075\\
\end{tabular}
\end{center}
\subsection{Exact partial likelihood}
Returning to the lottery analogy, for the two deaths at time 6
the exact partial likelihood computes the direct probability that
those two subjects would be selected given that a pair will be chosen.
The numerator is $r_3 r_4$, the product of the risk scores of the subjects
with an event, and the denominator is the sum over all 6 pairs
who could have been chosen: $r_3r_4 + r_3r_5 + r_3r_6 + r_4 r_5 + r_4r_6
+ r_5r_6$.
(If there were 10 tied deaths from a pool of 60 available the
sum will have over 75 billion terms, each a product of 10 values;
a truly formidable computation!)
In our case, three of the four subjects at risk at time 6 have a risk score
of $\exp(0x)=1$ and one a risk score of $r$,
and the denominator is $r +r+ r+1 +1 +1$.
\begin{eqnarray*}
LL &=& \{\beta- \log(3r+3)\} + \{\beta - \log(3r+3)\} + \{0-0\} \\
&=& 2\{\beta - \log(3r+3)\}. \\ \\
U &=& \left(1-\frac{r}{r+1}\right) + \left(1-\frac{r}{r+1}\right) + (0-0)\\
&=& \frac{2}{r+1}. \\ \\
-\imat&=& \frac{2r}{(r+1)^2}. \\
\end{eqnarray*}
The solution $U(\beta)=0$ corresponds to $r=\infty$, with a loglikelihood
that asymptotes to $-2\log(3)$ = 2.1972.
The Newton--Raphson iteration has increments of $(r+1)/r$ leading to
the following iteration for $\bhat$:
<>=
temp <- matrix(0, 8, 3)
dimnames(temp) <- list(paste0("iteration ", 0:7, ':'), c("beta", "loglik", "H"))
bhat <- 0
for (i in 1:8) {
r <- exp(bhat)
temp[i,] <- c(bhat, 2*(bhat - log(3*r +3)), 2*r/(r+1)^2)
bhat <- bhat + (r+1)/r
}
round(temp,3)
@
The Newton-Raphson iteration quickly settles down to addition of a constant
increment to $\bhat$ at each step
while the partial likelihood approaches an asymptote:
this is a fairly common case when the Cox MLE is infinite.
A solution at $\bhat=10$ or 15 is hardly different in likelihood from the
true maximum, and most programs will stop iterating around this point.
The information matrix, which measures the curvature of the likelihood
function at $\beta$, rapidly goes to zero as $\beta$ grows.
It is difficult to describe a satisfactory definition of the expected number
of events for each subject and thus a definition of the proper
martingale residual for the exact calculation.
Among other things it should lead to a consistent score residual,
i.e., ones that sum to the total score statistic $U$
\begin{align*}
L_i &= \int (x_i - \xbar(t)) dM_i(t) \\
\sum L_i &= U
\end{align*}
The residuals defined above for the Breslow and Efron approximations have this property,
for instance.
The exact partial likelihood contribution to $U$ for a set of set of $k$
tied deaths, however, is a sum of all subsets of size $k$; how would one
partition this term as a simple sum over subjects?
The exact partial likelihood is infrequently used and examination of
post fit residuals is even rarer.
The survival package (and all others that I know of) takes the easy
road in this case and uses equation \eqref{mart} along with the
Nelson-Aalen-Breslow hazard to form residuals.
They are certainly not correct, but the viable options were to use
this, the Efron residuals, or print an error message.
At $\bhat=\infty$ the Breslow residuals are still well defined.
Subjects 1 to 3, those with a covariate of 1, experience a hazard of
$r/(3r+3)=1/3$ at time 1. Subject 3 accumulates a hazard of 1/3 at
time 1 and a further hazard of 2 at time 6.
The remaining subjects are at an infinitely lower risk during days
1 to 6 and accumulate no hazard then, with subject 6 being credited with
1 unit of hazard at the last event.
The residuals are thus $1-1/3=2/3$, $0-1/3$, $1-7/3= -4/3$, $1-0$, 0, and 0,
respectively, for the six subjects.
\section{Test data 2}
This data set also has a single covariate, but in this case
a (start, stop] style of input is employed.
Table \ref{tab:val2} shows the data sorted by the end time of the risk
intervals. The columns for $\xbar$ and hazard are the values
at the event times; events occur at the end of
each interval for which status = 1.
\begin{table}\centering
\begin{tabular}{ccc|ccc}
&&&Number& \\
Time&Status&$x$&at Risk& $\xbar$& $d\lhat$ \\ \hline
(1,2] &1 &1 &2 &$r/(r+1)$ & $1/(r+1)$\\
(2,3] &1 &0 &3 & $r/(r+2)$ & $1/(r+2)$ \\
(5,6] &1 &0 &5 & $3r/(3r+2)$ & $1/(3r+2)$ \\
(2,7] &1 &1 &4 & $3r/(3r+1)$ & $1/(3r+1)$ \\
(1,8] &1 &0 &4 & $3r/(3r+1)$ & $1/(3r+1)$ \\
(7,9] &1 &1 &5 & $3r/(3r+2)$ & $2/(3r+2)$ \\
(3,9] &1 &1 && \\
(4,9] &0 &1 && \\
(8,14]&0 &0 &2& 0&0 \\
(8,17]&0 &0 &1& 0&0 \\
\end{tabular}
\caption{Test data 2}
\label{tab:val2}
\end{table}
\subsection{Breslow approximation}
For the Breslow approximation we have
\begin{eqnarray*}
LL &=& \log\left(\frac{r}{r+1}\right)
+\log\left(\frac{1}{r+2}\right)
+\log\left(\frac{1}{3r+2}\right) +\\
&& \log\left(\frac{r}{3r+1}\right)
+\log\left(\frac{1}{3r+1}\right)
+2\log\left(\frac{r}{3r+2}\right) \\
&=& 4\beta - \log(r+1) - \log(r+3)- 3\log(3r+2) -2\log(3r+1). \\ \\
U &=& \left(1-\frac{r}{r+1}\right) + \left(0-\frac{r}{r+2}\right)
+ \left(0-\frac{3r}{3r+2}\right) + \\
&& \left(1-\frac{3r}{3r+1}\right) +
\left(0-\frac{3r}{3r+1}\right) +
2\left(1-\frac{3r}{3r+2}\right) \\
\\ \\
\imat &=& \frac{r}{(r+1)^2} + \frac{2r}{(r+2)^2} +
\frac{6r}{(3r+2)^2} + \frac{3r}{(3r+1)^2} \\
&& \frac{3r}{(3r+1)^2} + \frac{12r}{(3r+2)^2} . \\
\end{eqnarray*}
In this case $U$ is a quartic equation
and we find the solution numerically.
<>=
ufun <- function(r) {
4 - (r/(r+1) + r/(r+2) + 3*r/(3*r+2) + 6*r/(3*r+1) + 6*r/(3*r+2))
}
rhat <- uniroot(ufun, c(.5, 1.5), tol=1e-8)$root
bhat <- log(rhat)
c(rhat=rhat, bhat=bhat)
@
The solution is at $U(\bhat)=0$ or $r \approx .9189477$;
$\bhat = \log(r) \approx -.084526$.
Then
<>=
true2 <- function(beta, newx=0) {
r <- exp(beta)
loglik <- 4*beta - log(r+1) - log(r+2) - 3*log(3*r+2) - 2*log(3*r+1)
u <- 1/(r+1) + 1/(3*r+1) + 4/(3*r+2) -
( r/(r+2) +3*r/(3*r+2) + 3*r/(3*r+1))
imat <- r/(r+1)^2 + 2*r/(r+2)^2 + 6*r/(3*r+2)^2 +
3*r/(3*r+1)^2 + 3*r/(3*r+1)^2 + 12*r/(3*r+2)^2
hazard <-c( 1/(r+1), 1/(r+2), 1/(3*r+2), 1/(3*r+1), 1/(3*r+1), 2/(3*r+2) )
xbar <- c(r/(r+1), r/(r+2), 3*r/(3*r+2), 3*r/(3*r+1), 3*r/(3*r+1),
3*r/(3*r+2))
# The matrix of weights, one row per obs, one col per time
# deaths at 2,3,6,7,8,9
wtmat <- matrix(c(1,0,0,0,1,0,0,0,0,0,
0,1,0,1,1,0,0,0,0,0,
0,0,1,1,1,0,1,1,0,0,
0,0,0,1,1,0,1,1,0,0,
0,0,0,0,1,1,1,1,0,0,
0,0,0,0,0,1,1,1,1,1), ncol=6)
wtmat <- diag(c(r,1,1,r,1,r,r,r,1,1)) %*% wtmat
x <- c(1,0,0,1,0,1,1,1,0,0)
status <- c(1,1,1,1,1,1,1,0,0,0)
xbar <- colSums(wtmat*x)/ colSums(wtmat)
n <- length(x)
# Table of sums for score and Schoenfeld resids
hazmat <- wtmat %*% diag(hazard) #each subject's hazard over time
dM <- -hazmat #Expected part
for (i in 1:6) dM[i,i] <- dM[i,i] +1 #observed
dM[7,6] <- dM[7,6] +1 # observed
mart <- rowSums(dM)
# Table of sums for score and Schoenfeld resids
# Looks like the last table of appendix E.2.1 of the book
resid <- dM * outer(x, xbar, '-')
score <- rowSums(resid)
scho <- colSums(resid)
# We need to split the two tied times up, to match coxph
scho <- c(scho[1:5], scho[6]/2, scho[6]/2)
var.g <- cumsum(hazard*hazard /c(1,1,1,1,1,2))
var.d <- cumsum( (xbar-newx)*hazard)
surv <- exp(-cumsum(hazard) * exp(beta*newx))
varhaz <- (var.g + var.d^2/imat)* exp(2*beta*newx)
list(loglik=loglik, u=u, imat=imat, xbar=xbar, haz=hazard,
mart=mart, score=score, rmat=resid,
scho=scho, surv=surv, var=varhaz)
}
val2 <- true2(bhat)
rtemp <- round(val2$mart, 6)
@
$$
\begin{array}{ll}
LL(0)= \Sexpr{round(true2(0)$loglik, 6)} &
LL(\bhat)= \Sexpr{round(val2$loglik,6)} \\
U(0) = -2/15 & U(\bhat) = 0 \\
\imat(0) = 2821/1800 & \imat(\bhat) = \Sexpr{round(val2$imat,6)} %$
\end{array}
$$
\def\haz{\hat \lambda}
The martingale residuals are (status--cumulative hazard)
or $O-E = \delta_i - \int Y_i(s) r_i d\lhat(s)$.
Let $\haz_1, \ldots, \haz_6$ be the six increments to the cumulative
hazard listed in Table \ref{tab:val2}. Then the cumulative hazards and
martingale residuals for the subjects are as follows.
\begin{center}
\begin{tabular}{c|lrr}
Subject &$\Lambda_i$ & $\Mhat(0)$ & $\Mhat(\bhat)$ \\ \hline
1& $r\haz_1$ & 1--30/60 & \Sexpr{rtemp[1]} \\
2& $\haz_2 $ & 1--20/60 & \Sexpr{rtemp[2]}\\
3& $\haz_3 $ & 1--12/60 & \Sexpr{rtemp[3]} \\
4& $r(\haz_2 + \haz_3 + \haz_4)$& 1--47/60 & \Sexpr{rtemp[4]}\\
5& $\haz_1+\haz_2+\haz_3+\haz_4+\haz_5$& 1--92/60 & \Sexpr{rtemp[5]}\\
6 &$r*(\haz_5 + \haz_6) $& 1--39/60 & \Sexpr{rtemp[6]} \\
7& $r*(\haz_3+\haz_4+\haz_5+ \haz_6)$& 1--66/60& \Sexpr{rtemp[7]}\\
8& $r*(\haz_3+\haz_4+\haz_5+ \haz_6)$& 0--66/60&\Sexpr{rtemp[8]} \\
9& $\haz_6$ & 0--24/60 & \Sexpr{rtemp[9]} \\
10& $\haz_6$ & 0--24/60 & \Sexpr{rtemp[10]}
\end{tabular}
\end{center}
The score and Schoenfeld residuals can be laid out in a tabular fashion.
Each entry in the table is the value of $\{x_i - \xbar(t_j)\} d\Mhat_i(t_j)$
for subject $i$ and event time $t_j$.
The row sums of the table are the score residuals for the subject;
the column sums are the Schoenfeld residuals at each event time.
Below is the table for $\beta= \log(2)$ ($r=2$).
This is a slightly more stringent test than the table for $\beta=0$,
since in this latter case a program could be missing a factor of
$r = \exp(\beta)=1$ and give the correct answer.
However, the results are much more compact than those for $\bhat$,
since the solutions are exact fractions.
%\newcommand{\pf}[2]{$\left(\frac{#1}{#2}\right)$} %positive fraction
%\newcommand{\nf}[2]{$\left(\frac{-#1}{#2}\right)$} %negative fraction
\newcommand{\pf}[2]{$\phantom{-}\frac{#1}{#2}$} %positive fraction
\newcommand{\nf}[2]{$-\frac{#1}{#2}$} %negative fraction
\renewcommand{\arraystretch}{1.5}
\begin{center}
\begin{tabular}{c|cccccc|c}
& \multicolumn{6}{c|}{Event Time} & Score\\
Id&2&3&6&7&8&9& Resid \\ \hline
1&\pf{1}{9} &&&&&& $\phantom{-}\frac{1}{9}$ \\
2&&\nf{3}{8} &&&&& $-\frac{3}{8}$ \\
3&&&\nf{21}{32}&&&& $-\frac{21}{32}$ \\
4&&\nf{1}{4} & \nf{1}{16} & \pf{5}{49} &&&
$-\frac{165}{784} $\\
5&\pf{2}{9}& \pf{1}{8} & \pf{3}{32} &
\pf{6}{49} & \nf{36}{49}&&
$-\frac{2417}{14112} $\\
6&&&&& \nf{2}{49} & \pf{1}{8} &
$\phantom{-}\frac{33}{392}$ \\
7&&& \nf{1}{16} & \nf{2}{49}& \nf{2}{49} &
\pf{1}{8} & $-\frac{15}{784}$ \\
8&&& \nf{1}{16} & \nf{2}{49}& \nf{2}{49} &
\nf{1}{8} & $-\frac{211}{784}$ \\
9&&&&&& \pf{3}{16} &$ \phantom{-}\frac{3}{16}$ \\
10&&&&&& \pf{3}{16} & $\phantom{-}\frac{3}{16}$ \\ \hline
& \pf{1}{3} &\nf{1}{2} & \nf{3}{4} &
\pf{1}{7} & \nf{6}{7} & \pf{1}{2} & \nf{95}{84} \\
&&&&&&& \\
& $\frac{1}{r+1}$ & $\frac{-r}{r+2}$ & $\frac{-3r}{r+2}$ & $\frac{1}{3r+1}$ &
$\frac{3r}{3r+1}$ & $\frac{4}{3r+2}$
\end{tabular}
\end{center}
Both the Schoenfeld and score residuals sum to the score statistic $U(\beta)$.
As discussed further above, programs will return two Schoenfeld residuals
at time 7, one for each subject who had an event at that time.
\subsection{Efron approximation}
This example has only one tied death time, so only the term(s) for the
event at time 9 change. The main quantities at that time point are as follows.
\begin{center}
\begin{tabular}{r|cc}
&Breslow & Efron \\ \hline
$LL$ & $2\log\left(\frac{r}{3r+2}\right)$ &
$\log\left(\frac{r}{3r+2}\right) + \log\left(\frac{r}{2r+2}\right)$ \\
$U$ & $\frac{2}{3r+2}$& $\frac{1}{3r+2} + \frac{1}{2r+2}$ \\
$\imat$& $2\frac{6r}{(3r+2)^2} $ &$\frac{6r}{(3r+2)^2} + \frac{4r}{(2r+2)^2}$\\
$d\lhat$ & $\frac{2}{3r+2} $ & $\frac{1}{3r+2} + \frac{1}{2r+2}$
\end{tabular}
\end{center}
\renewcommand{\arraystretch}{1}
\section{Test data 3}
This is very similar to test data 1, but with the addition of case
weights.
There are 9 observations, $x$ is a 0/1/2 covariate, and weights range from
1 to 4.
As before, let $r = \exp(\beta)$ be the risk score for a subject with $x=1$.
Table \ref{tab:val3} shows the data set along with the mean and
increment to the hazard at each point.
\begin{table} \centering
\begin{tabular}{cccc|cc}
Time&Status&$X$ & Wt& $\xbar(t)$ & $d\lhat_0(t)$ \\ \hline
1& 1& 2 & 1& $(2r^2+11r) d\lhat_0 =\xbar_1$ & $1/(r^2 + 11r +7)$ \\
1& 0& 0 & 2&& \\
2& 1& 1 & 3& $11r/(11r+5) = \xbar_2$ & $10/(11r+5)$ \\
2& 1& 1 & 4&& \\
2& 1& 0 & 3&& \\
2& 0& 1 & 2&& \\
3& 0& 0 & 1&& \\
4& 1& 1 & 2& $2r/(2r+1)= \xbar_3$ & $ 2/(2r+1)$ \\
5& 0& 0 & 1 & \\
\end{tabular}
\caption{Test data 3}
\label{tab:val3}
\end{table}
\subsection{Breslow estimates}
The likelihood is a product of terms, one for each death, of
the form
$$
\left( \frac{e^{X_i \beta}}{\sum_j Y_j(t_i) w_j e^{X_j \beta}}
\right) ^{w_i}
$$
For integer weights, this gives the same results as would
be obtained by replicating each observation the specified number of times,
which is in fact one motivation for the definition.
The definitions for the score vector $U$ and information matrix $\imat$
simply replace the mean and variance with weighted versions of the same.
Let $PL(\beta,w)$ be the log partial liklihood when all the observations
are given
a common case weight of $w$;
it is easy to prove that $PL(\beta,w) = wPL(\beta,1) - d\log(w)$ where $d$ is
the number of events.
One consequence of this is that $PL$ can be positive for weights
that are less than 1, a case which
sometimes occurs in survey sampling applications. (This can be a big
surprise the first time one encounters it.)
\begin{eqnarray*}
LL &=& \{2\beta - \log(r^2 + 11r +7)\} + 3\{\beta - \log(11r+5)\} \\
&& + 4\{\beta - \log(11r+5)\} +3\{0 - \log(11r+5)\} \\
&& + 2\{\beta - \log(2r+1) \} \\
&=& 11\beta - \log(r^2 + 11r +7) -10\log(11r+5) - 2\log(2r+1) \\\\
U &=& (2- \xbar_1) + 3(0-\xbar_2) + 4(1-\xbar_2) + 3(1-\xbar_2) +
2(1-\xbar_3) \\
&=& 11 - [(2r^2+11r)/(r^2+11r+7) + 10(11r/(11r+5)) + 2(2r/(2r+1))] \\
I &=& [(4r^2 + 11r)/(r^2 + 11r +7) - \xbar_1^2] + 10(\xbar_2 - \xbar_2^2)
+ 2(\xbar_3 - \xbar_3^2) \\
\end{eqnarray*}
The solution corresponds to $U(\beta)=0$ and can be computed using a
simple search for the zero of the equation.
<>=
ufun <- function(r) {
xbar <- c( (2*r^2 + 11*r)/(r^2 + 11*r +7), 11*r/(11*r + 5), 2*r/(2*r +1))
11- (xbar[1] + 10* xbar[2] + 2* xbar[3])
}
rhat <- uniroot(ufun, c(1,3), tol= 1e-9)$root
bhat <- log(rhat)
c(rhat=rhat, bhat=bhat)
@
From this we have
<>=
wfun <- function(r) {
beta <- log(r)
pl <- 11*beta - (log(r^2 + 11*r + 7) + 10*log(11*r +5) + 2*log(2*r +1))
xbar <- c((2*r^2 + 11*r)/(r^2 + 11*r +7), 11*r/(11*r +5), 2*r/(2*r +1))
U <- 11 - (xbar[1] + 10*xbar[2] + 2*xbar[3])
H <- ((4*r^2 + 11*r)/(r^2 + 11*r +7)- xbar[1]^2) +
10*(xbar[2] - xbar[2]^2) + 2*(xbar[3]- xbar[3]^2)
c(loglik=pl, U=U, H=H)
}
temp <- matrix(c(wfun(1), wfun(rhat)), ncol=2,
dimnames=list(c("loglik", "U", "H"), c("beta=0", "beta-hat")))
round(temp, 6)
@
When $\beta=0$, the three unique values for $\xbar$
at $t=1$, 2, and 4
are 13/19, 11/16 and 2/3,
respectively, and the increments to the cumulative hazard are
1/19, 10/16 = 5/8, and 2/3, see table \ref{tab:val3}.
The martingale and score residuals at $\beta=0$ and
$\bhat$ are
\begin{center}
\begin{tabular}{cc|lr}
Id &Time& \multicolumn{1}{c}{$M(0)$} &\multicolumn{1}{c}{$M(\bhat)$} \\ \hline
A&1 & $1-1/19 = 18/19 $&0.85531\\
B&1 & $0-1/19 = -1/19 $&-0.02593\\
C&2 & $1-(1/19 + 5/8)= 49/152 $&0.17636 \\
D&2 & $1-(1/19 + 5/8)= 49/152 $&0.17636\\
E&2 & $1-(1/19 + 5/8)= 49/152 $&0.65131\\
F&2 & $0-(1/19 + 5/8)= -103/152 $&-0.82364\\
G&3 & $0-(1/19 + 5/8)= -103/152 $&-0.34869\\
H&4 & $1-(1/19 + 5/8 +2/3)= -157/456 $&-0.64894\\
I&5 & $0-(1/19 + 5/8 +2/3)= -613/456 $&-0.69808\\
\end{tabular}
\end{center}
Score residuals at $\beta=0$ are
\begin{center}
\begin{tabular}{cc|r}
Id &Time& Score \\ \hline
A&1 &$(2-13/19)(1-1/19)$\\
B&1 &$(0-13/19)(0-1/19)$\\
C&2 &$ (1-13/19)(0-1/19) + (1-11/16)(1-5/8) $ \\
D&2 &$(1-13/19)(0-1/19) + (1-11/16)(1-5/8)$ \\
E&2 &$(0-13/19)(0-1/19) + (0-11/16)(1-5/8)$ \\
F&2 &$(1-13/19)(0-1/19) + (1-11/16)(0-5/8)$ \\
G&3 &$(1-13/19)(0-1/19) + (0-11/16)(0-5/8)$ \\
H&4 &$(1-13/19)(0-1/19) + (1-11/16)(0-5/8) $ \\
&& $ + (1-2/3)(1-2/3)$ \\
I&5 &$(1-13/19)(0-1/19) + (1-11/16)(0-5/8)$ \\
& &$+ (0-2/3)(0-2/3) $ \\
\end{tabular}
\end{center}
{\splus} also returns unweighted residuals by default, with an option to
return the weighted version;
it is the weighted sum of residuals that totals zero, $\sum w_i \Mhat_i=0$.
Whether the weighted or the unweighted form is more useful depends on the
intended application, neither is more ``correct'' than the other.
{\splus} does differ for the dfbeta residuals, for which the default is
to return weighted values. For the third observation in this data set,
for instance, the unweighted dfbeta is an approximation to the change in
$\bhat$ that will occur if the case weight is changed from 2 to 3,
corresponding to deletion of one of the three ``subjects'' that this
observation represents, and the weighted form approximates a change in
the case weight from 0 to 3, i.e., deletion of the entire observation.
The increments of the Nelson-Aalen estimate of the hazard are shown
in the rightmost column of table \ref{tab:val3}.
The hazard estimate for a hypothetical subject with covariate $X^\dagger$ is
$\Lambda_i(t) = \exp(X^\dagger \beta) \Lambda_0(t)$ and the survival
estimate is $S_i(t)= \exp(-\Lambda_i(t))$.
The two term of the variance, for $X^\dagger=0$, are Term1 + $d'Vd$:
\begin{center}
\begin{tabular}{c|l}
Time & Term 1 \\ \hline
1& $1/(r^2 + 11r+7)^2$ \\
2& $1/(r^2 + 11r+7)^2 + 10/(11r+5)^2$ \\
4& $1/(r^2 + 11r+7)^2 + 10/(11r+5)^2 + 2/(2r+1)^2$ \\
\multicolumn{2}{c}{} \\
Time & $d$ \\ \hline
1& $(2r^2+11r)/(r^2+11r+7)^2$ \\
2& $(2r^2+11r)/(r^2+11r+7)^2 + 110r/(11r+5)^2$ \\
4& $(2r^2+11r)/(r^2+11r+7)^2 + 110r/(11r+5)^2 + 4r/(2r+1)^2$
\end{tabular}
\end{center}
For $\beta=\log(2)$ and $X^\dagger =0$,
where $k\equiv$ the variance of
$\bhat$ = 1/2.153895 this reduces to
\begin{center}
\begin{tabular}{c|ll}
Time & \multicolumn{2}{c}{Variance}\\ \hline
1 & 1/1089 &+ $k(30/1089)^2$ \\
2 & (1/1089+ 10/729) &+ $k(30/1089+ 220/729)^2 $ \\
4 & (1/1089+ 10/729 + 2/25)&+ $k(30/1089+ 220/729 + 8/25)^2$\\
\end{tabular}
\end{center}
giving numeric values of 0.0012706, 0.0649885, and 0.2903805,
respectively.
\subsection{Efron approximation}
For the Efron approximation the combination of tied times and case weights
can be approached in at least two ways.
One is to treat the case weights as replication counts. There are then
10 tied deaths at time 2 in the data above, and the Efron approximation
involves 10 different denominator terms.
Let $a= 7r+3$, the sum of risk scores for the 3 observations with an event
at time 2 and $b=4r+2$, the sum of risk scores for the other subjects at
risk at time 2.
For the replication approach, the loglikelihood is
\begin{eqnarray*}
LL &=& \{2\beta - \log(r^2 + 11r +7)\} + \\
&& \{7\beta - \log(a+b) - \log(.9a+b) - \ldots - \log(.1a+b) \} + \\
&& \{2\beta - \log(2r+1) - \log(r+1)\}.
\end{eqnarray*}
A test program can be created by comparing results from the weighted
data set (9 observations) to the unweighted replicated data set (19
observations).
This is the approach taken by SAS \code{phreg} using the \code{freq}
statement. It's advantage is that the appropriate
result for all of the weighted computations is perfectly clear
the disadvantage is that the only integer case weights are supported.
(A secondary advantage is that I did not need to create another
algebraic derivation for this appendix.)
A second approach, used in the survival package, allows for non-integer weights.
The data is considered to be 3 tied observations, and the
log-likelihood at time 2 is the sum of 3 weighted terms.
The first term of the three is one of
\begin{eqnarray*}
&& 3 [\beta - \log(a+b)] \\
&& 4 [\beta - \log(a+b)] \\
&{\rm or}&3 [0 - \log(a+b)],
\end{eqnarray*}
depending on whether the event for observation C, D or E
actually happened first
(had we observed the time scale more exactly);
the leading multiplier of 3, 4 or
3 is the case weight.
The second term is one of
\begin{eqnarray*}
&& 4 [\beta - \log(4s+3+b)] \\
&& 3 [0 - \log(4s+3+b)] \\
&& 3 [\beta - \log(3s+3+b)] \\
&& 3 [\beta - \log(3s+3+b)] \\
&& 3 [0 - \log(4s+3+b)] \\
&{\rm or}&4 [\beta - \log(4s+3+b)].
\end{eqnarray*}
The first choice corresponds to an event order of observation C then D
(subject D has the event, with D and E still at risk), the
second to $C \rightarrow E$, then $D\rightarrow C$, $D\rightarrow E$,
$E \rightarrow C$ and $E \rightarrow D$, respectively.
For a weighted Efron approximation first replace the argument to the $\log$
function by its average argument, just as in the unweighted case.
Once this is done the average term in the above corresponds to using an
average weight of 10/3.
The final log-likelihood and score statistic are
\begin{eqnarray*}
LL &=& \{2\beta - \log(r^2 + 11r +7)\} \\ && +
\{7\beta - (10/3)[\log(a+b) + \log(2a/3 +b) + \log(a/3+b)] \} \\&& +
2\{\beta - \log(2r+1) \} \\ \\
U &=& (2- \xbar_1) + 2(1-\xbar_3) \\
&& + 7 - (10/3)[\xbar_2 + 26r/(26r+12) + 19r/(19r+9)] \\
&=& 11 -(\xbar_1 + (10/3)(\xbar_2 + \xbar_{2b} +\xbar_{2c})
+ 2\xbar_3)\\ \\
I &=& [(4s^2+11s)/(s^2+11s+7)- \xbar_1^2] \\
&&+ (10/3)[ (\xbar_2- \xbar_2^2) + (\xbar_{2b}- \xbar_{2b}^2)
+ (\xbar_{2b}- \xbar_{2b}^2) \\
&&+2(\xbar_3 - \xbar_3^2) \\
\end{eqnarray*}
The solution is at $\beta=.87260425$, and
$$
\begin{array}{ll}
LL(0)=-30.29218 & LL(\bhat)=-29.41678 \\
U(0) = 2.148183 & U(\bhat) = 0 \\
\imat(0) = 2.929182 & \imat(\bhat) = 1.969447 \,.
\end{array}
$$
The hazard increment and mean at times 1 and 4 are identical to those
for the Breslow approximation, as shown in table \ref{tab:val3}.
At time 2, the number at risk for the first, second and third portions
of the hazard increment
are $n_1= 11r+5$, $n_2= (2/3)(7r+3) + 4r+2 = (26r+12)/3$, and
$n_3=(1/3)(7r+3) + 4r+2 = (19r+9)/3$.
Subjects F--I experience the full hazard at time 2
of $(10/3)(1/n_1 + 1/n_2 + 1/n_3)$,
subjects B--D experience $(10/3)(1/n_1 + 2/3n_2 + 1/3n_3)$.
Thus, at $\beta=0$ the martingale residuals are
\begin{center}
\begin{tabular}{cc|ll}
Id& Time & \multicolumn{1}{c}{$\Mhat(0)$} \\ \hline
A & 1 & 1 - 1/19 & = 18/19 \\
B & 1 & 0 - 1/19 & = -1/19 \\
C & 2 & 1 - (1/19 + 10/48 + 20/114 + 10/84)& =473/1064 \\
D & 2 & 1 - (1/19 + 10/48 + 20/114 + 10/84)& =473/1064 \\
E & 2 & 1 - (1/19 + 10/48 + 20/114 + 10/84) &=473/1064 \\
F & 2 & 0 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28)& =-2813/3192 \\
G & 3 & 0 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28) &=-2813/3192 \\
H & 4 & 1 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28 + 2/3) &=-1749/3192 \\
I & 5 & 0 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28 + 2/3) &=-4941/3192
\end{tabular}
\end{center}
The hazard estimate for a hypothetical subject with covariate $X^\dagger$ is
$\Lambda_i(t) = \exp(X^\dagger \beta) \Lambda_0(t)$,
$\Lambda_0$ has increments of $1/(r^2 + 11r +7$,
$(10/3)(1/n_1 + 1/n_2 + 1/n_3)$ and $2/(2r+1)$.
This increment at time 2 is a little larger than the Breslow jump of
$10/d1$.
The first term of the variance will have an increment of
$[\exp((X^\dagger \beta)(]^2 (10/3)(1/n_1^2 + 1/n_2^2 + 1/n_3^2)$ at time 2.
The increment to the cumulative distance from the center $d$ will be
\begin{eqnarray*}
&& [X^\dagger - \frac{11r}{11r+5}] \frac{10}{3 n_1} \\
&+& [X^\dagger -\frac{(2/3)7r + 4r}{n2} ] (10/3)(1/n_2) \\
&+& [X^\dagger -\frac{(1/3)7r + 4r}{n2} ] (10/3)(1/n_3)
\end{eqnarray*}
For $X^\dagger = 1$ and $\beta=\pi/3$ we get cumulative hazard and variance
below. We have $r\equiv e^\pi/3$, $V$=
%\begin{tabular}{l$=$l|l$+$l$=$r}
%\multicolumn{2}{c}{$\Lambda$} & \multicolumn{3}{c}{Variance} \\ \hline
%e/(r^2+ 11r+7) & 0.03272832 & e^2/(r^2+ 11r+7)^2 &
\section{Multi-state data}
\begin{figure}
<>=
states <- c("Entry", "a", "b", "c")
smat <- matrix(0, 4, 4, dimnames=list(states, states))
smat[1,2:3] <- 1
smat[2,3] <- smat[3,2] <- smat[3,4] <- smat[2,4] <- 1
statefig(c(1,2,1), smat)
@
\caption{A simple 4 state model.}
\label{mstate1}
\end{figure}
Figure \ref{mstate1} shows a simple multi-state model, while table
\ref{mstate1t} shows a data set for the model.
Subject 1 follows the path of Entry, a, b, a, with no further follow
up after the final transition, while subjects 4 and 5 are 'censored';
they have further follow-up after the last observed change of state.
\begin{table} \centering
\begin{tabular}{ccrcc}
id & time1& time2 & event x \\ \hline
1 & 0 & 4 & a & 0 \\
1 & 4 & 9 & b & 0 \\
1 & 9 & 10 & a & 0 \\
2 & 0 & 5 & b & 1 \\
3 & 2 & 9 & c & 1 \\
4 & 0 & 2 & a & 0 \\
4 & 2 & 8 & c & 0\\
4 & 8 & 9 & & 0 \\
5 & 1 & 3 & b & 2\\
5 & 3 & 11 & & 2
\end{tabular}
\caption{A multi-state data set.}
\label{mstate1t}
\end{table}
(This section not yet finished.)
\bibliographystyle{plain}
\bibliography{refer}
\end{document}