[R] proximity covariate to model dependence in cox model (question from Vito Muggeo)

Thu Jul 29 12:38:35 CEST 2004

(I received a messsage from Vito Muggeo [who sometimes posts here] about
my previous posts here. Thus, it might be of interest here)

> dear Mayeul,
> I read your post on R -list about survival analysis with multiple (two
if I
> am not right) events  per subject.

Sometimes subjetcs have even 3 events (i.e. civil wars for last 40
years)

> 1)How did you calculate the pxcw variable?
(Note: pxcm stands for ProXimity.of.lastCivil.War)

the answer is at
https://www.stat.math.ethz.ch/pipermail/r-help/2004-July/053556.html
after the words "I used fit$loglik to chose a.chosen.parameter from 8
values, for 3 types of events:"

I used (exp(-days.since.event.of.type.one/a1)), which gives an indicator
between 1 and 0 that falls quickly first then more slowly (see the end of
the above post,  053556.html)

Simply only use  event.of.type.one. I am not doing a competing risk
analysis: I am only counting one type of event (type.one), but the
covariates measure other events that may increase the risk.

The difficulty is to choose the good a1 parameter. You will typically need
values much lower than mine with your exapmle dataset.

It then become:

la<-c(263.5, 526.9,1053.9,2107.8,4215.6,8431.1) #list of values to choose.
Adapt it
from
z<-NULL;for(a1 in la) {coxtmp <-
(coxph(Surv(start,stop,status)~
+I(exp(-days.since.event.of.type.one/a1))
+ other.time.dependent.covariates
+cluster(id)
,data=x,robust=T))
rbind(z,c(a1,coxtmp$wald.test, coxtmp$rscore, coxtmp$loglik,
coxtmp$score))->z
}
z <- data.frame(z)
names(z) <- c("a1","wald.test", "rscore",
"NULLloglik","loglik", "score")
z[which.max(z$rscore),]
z[which.max(z$loglik),]

> Namely, for instance,given the following data set (standard in multiple
> event formulation):
> > data.frame(id=c(3,3,4,4,5),start=c(0,10,0,7,0),stop=c(10,15,7,20,9),
> + event=c(1,1,1,0,0),stratum=c(1,2,1,2,1))
>   id start stop event stratum
> 1  3     0   10     1       1
> 2  3    10   15     1       2
> 3  4     0    7     1       1
> 4  4     7   20     0       2
> 5  5     0    9     0       1
> how the pxcw variable is computed for each *row* in the dataset?
> should be something like "ifelse(stratum==1,0,(stop-start))", i.e.
> (0,5,0,13,0) , or am I wrong?

It looks computationnaly OK, but strange to me.You can choose the function
that matches best the dependence, maybe it is.
But, assuming risk jumps just after an event then decreases slowly , I
would rather use a very high value when there is no previous event, for
instance:
ifelse(stratum==1,100,(stop-start))
Otherwise, a very close event (yesterday) would nearly be coded like no
previous event!
or even better, truncate the decrease :
ifelse(stratum==1,100,ifelse(stop-start>100, 100,stop-start ))
Here, after 100 days, it is like having no previous event.

But you should divide your observations to account for the change in the
proximity variable.
Instead of:
> id start stop event stratum   proxim
> 4     0    7     1       1    100
> 4     7   20     0       2    13

use

> id start stop event stratum   proxim
> 4     0    7     1       1    100
> 4     7    8     0       2    1
> 4     8    9     0       2    2
.
.
> 4     19   20    0       2    13

With the function I used, I used a covariate (days.since.event) which was
coded 9999999 if no previous event, I need:
exp((-days.since.event)/a1)

> Is this a your idea or did you find it elsewhere? could you give me any
> reference?

The proximity covariate is from
http://www.worldbank.org/research/conflict/papers/CivilPeace.pdf

(But they do not use the counting process we use here. They only measure
covariates for all countries when one country experiments an event.
Thus, all the remaining is mine (I think))

>
> Hope you can help me,
> regards,
> vito
>