[R] Explaining Survival difference between Stata and R
Göran Broström
gb at stat.umu.se
Wed May 12 08:54:15 CEST 2004
On Tue, May 11, 2004 at 07:10:36AM -0700, Thomas Lumley wrote:
> On Mon, 10 May 2004, Paul Johnson wrote:
>
> > Dear Everybody:
> >
> > I'm doing my usual "how does that work in R" thing with some Stata
> > projects. I find a gross gap between the Stata and R in Cox PH models,
> > and I hope you can give me some pointers about what goes wrong. I'm
> > getting signals from R/Survival that the model just can't be estimated,
> > but Stata spits out numbers just fine.
> >
> > I wonder if I should specify initial values for coxph?
>
> It's worth a try. The other question is whether Stata has in fact
> converged -- if the range of rqe is not small then its coefficient may
> actually be infinite.
>
> > I got a dataset from a student who uses Stata and try to replicate in R.
> > I will share data to you in case you want to see for yourself. Let me
> > know if you want text or Stata data file.
>
> I'd like to look at the data. We should be able to get coxph to converge
> when there is a finite mle -- the log partial likelihood is concave.
Paul and Thomas,
'coxph' or the data (I got it from Paul) indeed has a problem:
------------------------------------------------------------
Call:
coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
coef exp(coef) se(coef) z p
haz.wst 8.53e-08 1 9.47e-08 0.901 0.37
pol.free NA NA 0.00e+00 NA NA
Likelihood ratio test=0.76 on 1 df, p=0.385 n= 21
Warning message:
X matrix deemed to be singular; variable 2 in:
coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
---------------------------------------------------------------
Is it the data? Let's try 'coxreg' (eha):
---------------------------------------------------------------
Call:
coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
Covariate Mean Coef RR Wald p
haz.wst 2054901 0.000 1.000 0.372
pol.free 2.090 0.009 1.009 0.958
Events 21
Total time at risk 78
Max. log. likelihood -45.001
LR test statistic 0.76
Degrees of freedom 2
Overall p-value 0.684583
----------------------------------------------------------------
This worked just fine (Paul, same results as in Stata?). But, we
seem to have a scaling problem; lok at the means of the covariates!
Some rescaling gives:
----------------------------------------------------------------
Call:
coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free,
data = dat)
coef exp(coef) se(coef) z p
I(haz.wst * 1e-06) 0.08479 1.09 0.095 0.8920 0.37
pol.free 0.00896 1.01 0.170 0.0526 0.96
Likelihood ratio test=0.76 on 2 df, p=0.685 n= 21
----------------------------------------------------------------
and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph
for sure, but I do know that coxreg centers all covariates before the NR
procedure starts. Maybe we also should rescale to unit variance? And of
course scale back the coefficients and se:s at the end?
BTW, Paul's data is heavily tied. Could be an idea to use a discrete time
version of the PH model: you can do that with 'mlreg' (eha):
---------------------------------------------------------------
Call:
mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free,
data = dat)
Covariate Mean Coef RR Wald p
I(haz.wst * 1e-06) 2.055 0.097 1.102 0.320
pol.free 2.090 0.003 1.003 0.985
Events 21
Total time at risk 78
Max. log. likelihood -36.324
LR test statistic 0.93
Degrees of freedom 2
Overall p-value 0.627056
----------------------------------------------------------------
Doesn't make much of a difference, though. Efron's method is quite
robust. (You might check if there are any differences with 'breslow')
Best,
Göran
[...]
--
Göran Broström tel: +46 90 786 5223
Department of Statistics fax: +46 90 786 6614
Umeå University http://www.stat.umu.se/egna/gb/
SE-90187 Umeå, Sweden e-mail: gb at stat.umu.se
More information about the R-help
mailing list