[R] changes in coxph in "survival" from older version?
Frank Harrell
f.harrell at vanderbilt.edu
Tue May 17 14:19:28 CEST 2011
The problem is the use of variable selection without simultaneous shrinkage.
It will result in an entirely unreliable model and essentially will choose a
random sample of the predictors. See
http://www.childrensmercy.org/stats/faq/faq12.asp
Frank
Shi, Tao wrote:
>
> Hi Frank,
>
> I know it's kind of beyond the scope of R help, but I would appreciate it
> if you
> can elaborate further on this. Are you worrying about this variable
> selection
> approach or you think that there is something wrong with the data (it's
> actually
> a real dataset)? If it's the first one, I believe I can always narrow
> down the
> variables based on univarate analysis and build a multivariate model from
> that.
>
> Many thanks in advance.
>
> ...Tao
>
>
>
>
> ----- Original Message ----
>> From: Frank Harrell <f.harrell at vanderbilt.edu>
>> To: r-help at r-project.org
>> Sent: Mon, May 16, 2011 11:25:20 AM
>> Subject: Re: [R] changes in coxph in "survival" from older version?
>>
>> Please don't be serious about doing variable selection with this
>> dataset.
>> Frank
>>
>> Shi, Tao wrote:
>> >
>> > Hi Terry,
>> >
>> > Really appreciate your help! Sorry for my late reply.
>> >
>> > I did realize that there are way more predictors in the model. My
>> initial
>> > thinking was use that as an initial model for stepwise model
>> selection.
>> > Now I
>> > wonder if the model selection result is still valid if the initial
>> model
>> > didn't
>> > even converge?
>> >
>> > Thanks!
>> >
>> > ...Tao
>> >
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: Terry Therneau <therneau at mayo.edu>
>> >> To: "Shi, Tao" <shidaxia at yahoo.com>
>> >> Cc: r-help at r-project.org
>> >> Sent: Thu, May 12, 2011 6:42:09 AM
>> >> Subject: Re: changes in coxph in "survival" from older version?
>> >>
>> >>
>> >> On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:
>> >> > Hi all,
>> >> >
>> >> > I found that the two different versions of "survival" packages,
>> namely
>> >>2.36-5
>> >>
>> >> > vs. 2.36-8 or later, give different results for coxph function.
>> >> Please see
>> >
>> >> > below and the data is attached. The second one was done on Linux,
>> but
>> >>Windows
>> >>
>> >> > gave the same results. Could you please let me know which one I
>> >> should
>> >>trust?
>> >> >
>> >> > Thanks,
>> >>
>> >> In your case, neither. Your data set has 22 events and 17
>> predictors;
>> >> the rule of thumb for a reliable Cox model is 10-20 events per
>> predictor
>> >> which implies no more than 2 for your data set. As a result, the
>> >> coefficients of your model have very wide confidence intervals, the
>> coef
>> >> for Male for instance has se of 3.26, meaning the CI goes from 1/26
>> to
>> >> 26 times the estimate; i.e., there is no biological meaning to the
>> >> estimate.
>> >>
>> >> Nevertheless, why did coxph give a different answer? The later
>> >> version 2.36-9 failed to converge (20 iterations) with a final
>> >> log-likelihood of -19.94, the earlier code converges in 10
>> iterations to
>> >> -19.91. In version 2.36-6 an extra check was put into the maximizer
>> for
>> >> coxph in response to an exceptional data set which caused the
>> routine to
>> >> fail due to overflow of the exp function; the Newton-Raphson
>> iteration
>> >> algorithm had made a terrible guess in it's iteration path, which
>> can
>> >> happen with all NR based search methods.
>> >> I put a limit on the size the linear predictor in the Cox model
>> of
>> >> 21. The basic argument is that exp(linear-predictor) = relative
>> risk
>> >> for a subject, and that there is not much biological meaning for
>> risks
>> >> to be less than exp(-21) ~ 1/(population of the earh). There is
>> more to
>> >> the reasoning, interested parties should look at the comments in
>> >> src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will
>> >> happily accept input the "best" value for the constant.
>> >>
>> >> I never expected to see a data set with both convergence of the
>> LL
>> >> and linear predictors larger than +-15. Looking at the fit (older
>> code)
>> >> > round(fit2$linear.predictor, 2)
>> >> [1] 2.26 0.89 4.96 -19.09 -12.10 1.39 2.82 3.10
>> >> [9] 18.57 -25.25 22.94 8.75 5.52 -27.64 14.88 -23.41
>> >> [17] 13.70 -28.45 -1.84 10.04 12.62 2.54 6.33 -8.76
>> >> [25] 9.68 4.39 2.92 3.51 6.02 -17.24 5.97
>> >>
>> >> This says that, if the model is to be believed, you have several
>> near
>> >> immortals in the data set. (Everyone else on earth will perish
>> first).
>> >>
>> >> Terry Therneau
>> >>
>> >>
>> >>
>> >>
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>> -----
>> Frank Harrell
>> Department of Biostatistics, Vanderbilt University
>> --
>> View this message in context:
>>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3527017.html
>>
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3529014.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list