[R] survival: ridge log-likelihood workaround

Fri Dec 10 16:07:42 CET 2010

------ begin inclusion ---------
Dear all,

I need to calculate likelihood ratio test for ridge regression. In
February I have reported a bug where coxph returns unpenalized
log-likelihood for final beta estimates for ridge coxph regression. In
high-dimensional settings ridge regression models usually fail for lower
values of lambda. As the result of it, in such settings the ridge
regressions have higher values of lambda (e.g. over 100) which means
that the difference between unpenalized log-likelihood and penalized
log-likelihood is not insignificant. I would be grateful if someone can
confirm that the below code is correct workaround.

--- end included message ----

First, the "bug" you report is not a bug.  The log partial likelihood
from a Cox model LPL(beta) is well defined for any vector of
coefficients beta, whether they are result of a maximization or taken
from your daily horoscope.  The loglik component of coxph is the LPL for
the reported coefficients.

For a ridge regression the coxph function maximizes LPL(beta) -
penalty(beta) = penalized partial likelihood = PPL(beta).  You have
correctly recreated the PPL.

Second: how do you do formal tests on such a model?  This is hard.  The
difference LPL1- LPL2 is a chi-square when each is the result of
maximizing the Cox LPL over a set of coefficients; when using a PPL we
are maximizing over something else.  The distribution of the difference
of constrained LPL values can be argued to be a weighed sum of squared
normals where the weights are in (0,1), which is something more complex
than a chisq distribution.  In a world with infinite free time I'd have
pursued this, worked it all out, and added appropriate code to coxph.

What about the difference in PPL values, which is the test you propose?
I'm not aware of any theory showing that these have any relation to a
chi-square distribution.  (Said theory may well exist, and I'd be happy
for pointers.)

Terry Therneau