# [R] Coxph with factors

Thomas Lumley tlumley at u.washington.edu
Sat Jul 16 17:07:28 CEST 2005

```On Sat, 16 Jul 2005, Kylie-Anne Richards wrote:

> Thank you for your help.
> ____________________________________________________________
>> In any case, to specify f.pom You need it to be a factor with the same set
>> of levels.  You don't say what the lowest level of pom is, but if it is,
>> say, -3.
>>
>> f.pom=factor(-3, levels=seq(-3,2.5, by=0.5))
> ____________________________________________________________
>
> For this particular model, f.pom starts at -5.5 going to 2 in 0.5 increments.
> I seem to have misunderstood your explanation, as R is still producing an
> error.

In the model you showed, there were no factor levels below -2.5.  You need
to make sure that the levels are the same in the initial data and the data
supplied to survfit.  Check this with levels().

> ____________________________________________________________
>> I would first note that the survival function at zero covariates is not a
>> very useful thing and is usually numerically unstable, and it's often more
>> useful to get the survival function at some reasonable set of covariates.
> ____________________________________________________________
>
> Please correct me if I'm wrong, I was under the impression that the survival
> function at zero covariates gave the baseline distribution. I.e. if given the
> baseline prob.,S_0, at time t, one could calculate the survival prob for
> specified covariates by
> S_0^exp(beta(vo)*specified(vo)+beta(po)*specified(po)+beta(f.pom at the level
> of interest)) for time t.
>
> Since I was unable to get survfit to work with specified covariates, I was
> using the survival probs of the 'avg' covariates, S(t), to determine the
> baseline at time t, i.e.
> S(t)^(1/exp(beta(vo)*mean(vo)+beta(po)*mean(po)+beta(f.pom-5.5)*mean(f.pom-5.5)+beta(f.pom-5.0)*mean(f.pom-5.0)+........).
> And then proceeding as mention in the above paragraph (clearly not an
> efficient way of doing things).
>

Yes, but you don't need to go via the baseline.  The survival curves for
any two covariate vectors z1 and z2 are related by

S(t; z1)= S(t; z2)^(z1-z2)

For convenience of mathematical notation, mathematical statisticians write
everything in terms of z2=0, and call this "the baseline". In the real
world, though, you are better off with a baseline defined at a covariate
value somewhere in the vicinity of the actual data. If, as if often the
case, the zero covariate value is a long way from the observed data, both
the computation of the survival curve at zero and the transformation to
the covariates you want are numerically ill-conditioned.

So, you can use the "baseline" returned by survfit(z2), which is at
z2=fit\$means, to do anything you can do with the baseline at z=0, and the
computations will be more accurate.

-thomas

```