[R] Explaining Survival difference between Stata and R
Paul Johnson
pauljohn at ku.edu
Thu May 13 09:10:35 CEST 2004
Dear Goran (and others)
I did not know about the "eha" package, but reading the docs, I see many
things I've been looking for, including the parametric hazard model with
Weibull baseline. Thanks for the tip, and the package.
I still don't quite understand your point about the reason that coxph
crashes. Why does the difference of scale between the variables cause
trouble? I understand that a redundant set of variables would cause
singularity, but do not see why differing scales for variables would
cause that. Does the explanation lie in the optimization algorithm
itself? I'll read the eha package coxreg source code. I bet that will
show me what's going on in your fix, anyway.
Still, I wish coxph just handled all that stuff.
Just for fun, here are the Stata results for the same small model that
you report. If I ask stata to use the efron method of breaking ties,
then the results do almost exactly match what you found.
stset yrs2, failure (ratify)
stcox haz_wst pol_free , robust nohr efron
Iteration 0: log pseudo-likelihood = -45.380139
Iteration 1: log pseudo-likelihood = -45.00981
Iteration 2: log pseudo-likelihood = -45.001196
Iteration 3: log pseudo-likelihood = -45.001193
Refining estimates:
Iteration 0: log pseudo-likelihood = -45.001193
Cox regression -- Efron method for ties
No. of subjects = 21 Number of obs =
21
No. of failures = 21
Time at risk = 78
Wald chi2(2) =
1.09
Log pseudo-likelihood = -45.001193 Prob > chi2 =
0.5785
------------------------------------------------------------------------------
| Robust
_t | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
haz_wst | 8.48e-08 8.63e-08 0.98 0.326 -8.43e-08
2.54e-07
pol_free | .0089626 .1047656 0.09 0.932 -.1963741
.2142994
------------------------------------------------------------------------------
Göran Broström wrote:
> Paul and Thomas,
>
> 'coxph' or the data (I got it from Paul) indeed has a problem:
> ------------------------------------------------------------
> Call:
> coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
>
>
> coef exp(coef) se(coef) z p
> haz.wst 8.53e-08 1 9.47e-08 0.901 0.37
> pol.free NA NA 0.00e+00 NA NA
>
> Likelihood ratio test=0.76 on 1 df, p=0.385 n= 21
> Warning message:
> X matrix deemed to be singular; variable 2 in:
> coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
> ---------------------------------------------------------------
>
> Is it the data? Let's try 'coxreg' (eha):
> ---------------------------------------------------------------
> Call:
> coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
>
> Covariate Mean Coef RR Wald p
> haz.wst 2054901 0.000 1.000 0.372
> pol.free 2.090 0.009 1.009 0.958
>
> Events 21
> Total time at risk 78
> Max. log. likelihood -45.001
> LR test statistic 0.76
> Degrees of freedom 2
> Overall p-value 0.684583
> ----------------------------------------------------------------
>
> This worked just fine (Paul, same results as in Stata?). But, we
> seem to have a scaling problem; lok at the means of the covariates!
> Some rescaling gives:
> ----------------------------------------------------------------
> Call:
> coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free,
> data = dat)
>
>
> coef exp(coef) se(coef) z p
> I(haz.wst * 1e-06) 0.08479 1.09 0.095 0.8920 0.37
> pol.free 0.00896 1.01 0.170 0.0526 0.96
>
> Likelihood ratio test=0.76 on 2 df, p=0.685 n= 21
> ----------------------------------------------------------------
> and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph
> for sure, but I do know that coxreg centers all covariates before the NR
> procedure starts. Maybe we also should rescale to unit variance? And of
> course scale back the coefficients and se:s at the end?
>
> BTW, Paul's data is heavily tied. Could be an idea to use a discrete time
> version of the PH model: you can do that with 'mlreg' (eha):
> ---------------------------------------------------------------
> Call:
> mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free,
> data = dat)
>
> Covariate Mean Coef RR Wald p
> I(haz.wst * 1e-06) 2.055 0.097 1.102 0.320
> pol.free 2.090 0.003 1.003 0.985
>
> Events 21
> Total time at risk 78
> Max. log. likelihood -36.324
> LR test statistic 0.93
> Degrees of freedom 2
> Overall p-value 0.627056
> ----------------------------------------------------------------
> Doesn't make much of a difference, though. Efron's method is quite
> robust. (You might check if there are any differences with 'breslow')
>
> Best,
>
> Göran
>
> [...]
--
Paul E. Johnson email: pauljohn at ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
More information about the R-help
mailing list