[R] Explaining Survival difference between Stata and R

Thu May 13 09:10:35 CEST 2004

Dear Goran (and others)

I did not know about the "eha" package, but reading the docs, I see many 
things I've been looking for, including the parametric hazard model with 
Weibull baseline.  Thanks for the tip, and the package.

I still don't quite understand your point about the reason that coxph 
crashes.  Why does the difference of scale between the variables cause 
trouble?  I understand that a redundant set of variables would cause 
singularity, but do not see why differing scales for variables would 
cause that.  Does the explanation lie in the optimization algorithm 
itself?  I'll read the eha package coxreg source code. I bet that will 
show me what's going on in your fix, anyway.

Still, I wish coxph just handled all that stuff.

Just for fun, here are the Stata results for the same small model that 
you report.  If I ask stata to use the efron method of breaking ties, 
then the results do  almost exactly match what you found.

stset yrs2, failure (ratify)
stcox haz_wst  pol_free  , robust nohr efron

Iteration 0:   log pseudo-likelihood = -45.380139
Iteration 1:   log pseudo-likelihood =  -45.00981
Iteration 2:   log pseudo-likelihood = -45.001196
Iteration 3:   log pseudo-likelihood = -45.001193
Refining estimates:
Iteration 0:   log pseudo-likelihood = -45.001193

Cox regression -- Efron method for ties

No. of subjects       =           21               Number of obs   = 
     21
No. of failures       =           21
Time at risk          =           78
                                                    Wald chi2(2)    = 
    1.09
Log pseudo-likelihood =   -45.001193               Prob > chi2     = 
0.5785

------------------------------------------------------------------------------
              |               Robust
           _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. 
Interval]
-------------+----------------------------------------------------------------
      haz_wst |   8.48e-08   8.63e-08     0.98   0.326    -8.43e-08 
2.54e-07
     pol_free |   .0089626   .1047656     0.09   0.932    -.1963741 
.2142994
------------------------------------------------------------------------------

Göran Broström wrote:

> Paul and Thomas,
> 
> 'coxph' or the data (I got it from Paul) indeed has a problem:
> ------------------------------------------------------------
> Call:
> coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
> 
> 
>              coef exp(coef) se(coef)     z    p
> haz.wst  8.53e-08         1 9.47e-08 0.901 0.37
> pol.free       NA        NA 0.00e+00    NA   NA
> 
> Likelihood ratio test=0.76  on 1 df, p=0.385  n= 21 
> Warning message: 
> X matrix deemed to be singular; variable 2 in: 
> coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) 
> ---------------------------------------------------------------
> 
> Is it the data? Let's try 'coxreg' (eha):
> ---------------------------------------------------------------
> Call:
> coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
> 
> Covariate           Mean       Coef        RR       Wald p
> haz.wst           2054901     0.000     1.000        0.372 
> pol.free            2.090     0.009     1.009        0.958 
> 
> Events                    21 
> Total time at risk            78 
> Max. log. likelihood      -45.001 
> LR test statistic         0.76 
> Degrees of freedom        2 
> Overall p-value           0.684583
> ----------------------------------------------------------------
> 
> This worked just fine (Paul, same results as in Stata?). But, we 
> seem to have a scaling problem; lok at the means of the covariates!
> Some rescaling gives:
> ----------------------------------------------------------------
> Call:
> coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, 
>     data = dat)
> 
> 
>                       coef exp(coef) se(coef)      z    p
> I(haz.wst * 1e-06) 0.08479      1.09    0.095 0.8920 0.37
> pol.free           0.00896      1.01    0.170 0.0526 0.96
> 
> Likelihood ratio test=0.76  on 2 df, p=0.685  n= 21 
> ----------------------------------------------------------------
> and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph
> for sure, but I do know that coxreg centers all covariates before the NR
> procedure starts. Maybe we also should rescale to unit variance? And of
> course scale back the coefficients and se:s at the end? 
> 
> BTW, Paul's data is heavily tied. Could be an idea to use a discrete time
> version of the PH model: you can do that with 'mlreg' (eha): 
> --------------------------------------------------------------- 
> Call:
> mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, 
>     data = dat)
> 
> Covariate           Mean       Coef        RR     Wald p
> I(haz.wst * 1e-06)  2.055     0.097     1.102      0.320 
> pol.free            2.090     0.003     1.003      0.985 
> 
> Events                    21 
> Total time at risk            78 
> Max. log. likelihood      -36.324 
> LR test statistic         0.93 
> Degrees of freedom        2 
> Overall p-value           0.627056
> ----------------------------------------------------------------
> Doesn't make much of a difference, though. Efron's method is quite
> robust. (You might check if there are any differences with 'breslow')
> 
> Best,
> 
> Göran
> 
> [...]

-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700