[R] Coxph with factors
Thomas Lumley
tlumley at u.washington.edu
Sat Jul 16 17:07:28 CEST 2005
On Sat, 16 Jul 2005, Kylie-Anne Richards wrote:
> Thank you for your help.
> ____________________________________________________________
>> In any case, to specify f.pom You need it to be a factor with the same set
>> of levels. You don't say what the lowest level of pom is, but if it is,
>> say, -3.
>>
>> f.pom=factor(-3, levels=seq(-3,2.5, by=0.5))
> ____________________________________________________________
>
> For this particular model, f.pom starts at -5.5 going to 2 in 0.5 increments.
> I seem to have misunderstood your explanation, as R is still producing an
> error.
In the model you showed, there were no factor levels below -2.5. You need
to make sure that the levels are the same in the initial data and the data
supplied to survfit. Check this with levels().
> ____________________________________________________________
>> I would first note that the survival function at zero covariates is not a
>> very useful thing and is usually numerically unstable, and it's often more
>> useful to get the survival function at some reasonable set of covariates.
> ____________________________________________________________
>
> Please correct me if I'm wrong, I was under the impression that the survival
> function at zero covariates gave the baseline distribution. I.e. if given the
> baseline prob.,S_0, at time t, one could calculate the survival prob for
> specified covariates by
> S_0^exp(beta(vo)*specified(vo)+beta(po)*specified(po)+beta(f.pom at the level
> of interest)) for time t.
>
> Since I was unable to get survfit to work with specified covariates, I was
> using the survival probs of the 'avg' covariates, S(t), to determine the
> baseline at time t, i.e.
> S(t)^(1/exp(beta(vo)*mean(vo)+beta(po)*mean(po)+beta(f.pom-5.5)*mean(f.pom-5.5)+beta(f.pom-5.0)*mean(f.pom-5.0)+........).
> And then proceeding as mention in the above paragraph (clearly not an
> efficient way of doing things).
>
Yes, but you don't need to go via the baseline. The survival curves for
any two covariate vectors z1 and z2 are related by
S(t; z1)= S(t; z2)^(z1-z2)
For convenience of mathematical notation, mathematical statisticians write
everything in terms of z2=0, and call this "the baseline". In the real
world, though, you are better off with a baseline defined at a covariate
value somewhere in the vicinity of the actual data. If, as if often the
case, the zero covariate value is a long way from the observed data, both
the computation of the survival curve at zero and the transformation to
the covariates you want are numerically ill-conditioned.
So, you can use the "baseline" returned by survfit(z2), which is at
z2=fit$means, to do anything you can do with the baseline at z=0, and the
computations will be more accurate.
-thomas
More information about the R-help
mailing list