[R] Results from clogit out of range?
Terry Therneau
therneau at mayo.edu
Mon Mar 4 15:04:29 CET 2013
I'm late to this discussion, but let me try to put it in another context.
Assume that I wanted to know whether kids who live west of their school or east of
their shool are more likely to be early (some hypothesis about walking slower if the sun
is in their eyes). So I create a 0/1 variable east/west and get samples of 10 student
arrival times at each of 100 different schools. Fit the model
lm(arrive ~ factor(school) + east.west)
where "arrive" is in some common scale like "minutes since midnight". Since different
schools could have different starting times for their first class we need an intercept per
school.
Two questions:
1. Incremental effect: the coefficient of east/west measures the incredmental effect
across all schools. With n of 1000 it is likely estimated with high precision.
2. Absolute: predict the average arrival time (on the clock) for students.
Conditional logistic is very like this. We have a large number of strata ("schools") with
a small number of observations in each (often only 2 per strata). One can ask incremental
questions about variables common to all strata, but absolute prediction is pretty
worthless. a. You can only do it for schools (strata) that have already been seen and b.
there are so few subjects in each of them that the estimates are very noisy.
The default prediction from clogit is focused on questions of type 1. The
documentation doesn't even bother to mention predictions of type 2, which would be
probabilities of events. I can think of a way to extract such output from the routine
(being the author gives some insight), but why would I want to?
Terry Therneau
More information about the R-help
mailing list