[R] R-help Digest, Vol 110, Issue 23
Terry Therneau
therneau at mayo.edu
Mon Apr 23 14:57:15 CEST 2012
Yes, the (start, stop] formalism is the easiest way to deal with time
dependent data.
Each individual only needs to have sufficient data to describe them, so
for if id number 4 is in house 1, their housemate #1 was eaten at time
2, and the were eaten at time 10, the following is sufficient data for
that subject:
id house time1 time2 status discovered
4 1 0 2 0 false
4 1 2 10 1 true
We don't need observations for each intermediate time, only that from
0-2 they were not yet discovered and that from 2-10 they were. The
status variable tells whether an interval ended in disaster. Use
Surv((time1, time2, status) on the left side of the equation.
Since the time scale is discrete you should technically use
method='exact' in a Cox model, but the default Efron approximation will
be very close.
Interval censoring isn't necessary. You will have a model of "time to
discovery" instead of "time to eaten", but with a fixed examination
schedule such as you have there is no information in the data to help
you move from one to the other. The standard interval approach would
just assume deaths happened at the midpoint between examinations.
Terry T.
On 04/21/2012 05:00 AM, r-help-request at r-project.org wrote:
> Dear R users,
>
> I fear this is terribly trivial but I'm struggling to get my head around it.
>
> First of all, I'm using the "survival" package in R 2.12.2 on Windows Vista with the RExcel plugin. You probably only need to know that I'm using "survival" for this.
>
> I have data collected from 180 or so individuals that were checked 7 times throughout a trial with set start and end times. Once the event happens (death by predator) there are no more checks for that individual. This means that I check on each individual up to 7 times with either an event recorded or the final time being censored.
>
> At the moment, I have a data sheet with one observation per individual; that is either the event time (the observation time when the individual had had an event) or the censored time. However, I'd like to add a time dependent factor and I also wonder if this data should be treated as interval censored.
>
> The time dependent factor is like this. The individuals are grouped in "houses" and once one individual in a group has an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered the others in the "house" as well (the predator is able to consume many individuals). At the moment I'm coding this as a normal two level factor (discovered) where all individuals alive after the first event in that house are "TRUE" and the first individuals in a house to be eaten are "FALSE". All individuals in houses that were not discovered at al are also "FALSE"l. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway).
>
> Should I write up this data set with a start and stop time for every check I made so each individual has up to 7 records, one for each time I checked?
>
> Is there a quick and easy way to do this in R or would I have to go through the data set manually?
>
> Does coding the "discovered" factor the way I have, make statistical sense?
>
> Should I worry about proportional hazards of the "discovered" factor? It seems to me that it would often turn out not proportional because of its nature.
>
> Sorry, lots of stats questions. I don't mind if you don't answer all of these. Just knowing how to best feed this data into R would help me no end. The rest I can probably glean from the millions of survival analysis books I have lying about.
More information about the R-help
mailing list