[R] basehaz() in package 'Survival' and warnings() with coxph

David Winsemius dwinsemius at comcast.net
Fri Aug 10 04:28:22 CEST 2012


On Aug 9, 2012, at 5:53 PM, hazbro wrote:

> My sessionInfo is as follows:
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
snip
>>
>
> It will be difficult to reproduce an example here as the data set I  
> am using
> in very large. I can give you an example:
>
> fit3.1<- coxph(formula = y ~ sex + ns(ageyrs, df = 2) +  
> AdmissionSource +
> +     X1 + X2 + X3 + X5 + X6 + X7 + X11 + X12 + X13 + X14 + X15 +
> +     X16 + X17 + X18 + X19 + X20 + X22 + X24 + X25 + X26 + X27 +
> +     X28 + X29 + X32 + X33 + X35 + X38 + X39 + X40 + X41 + X42 +
> +     X43 + X44 + X47 + X49 + X53 + X54 + X55 + X58 + X59 + X62 +
> +     X68 + X69 + X78 + X80 + X81 + X84 + X85 + X86 + X93 + X95 +
> +     X98 + X100 + X101 + X102 + X105 + X107 + X108 + X109 + X110 +
> +     X112 + X113 + X114 + X115 + X116 + X117 + X121 + X122 + X125 +
> +     X127 + X128 + X129 + X131 + X132 + X133 + X134 + X138 + X140 +
> +     X143 + X145 + X146 + X148 + X150 + X151 + X153 + X157 + X158 +
> +     X159 + X164 + X197 + X200 + X202 + X203 + X204 + X205 + X211 +
> +     X214 + X217 + X224 + X228 + X233 + X237 + X244 + X249 + X254 +
> +     X258 + X259 + X260 + CharlsonIndex + ethnic + day + season +
> +     ln, data = dat2)
>

> haz<-basehaz(fit3.1) # gives 507 unique haz$time, time points
>
> fit2<-coxph(y~ns(ageyrs,df=2)+day+ln+sex+AdmissionSource+season 
> +CharlsonIndex,data=dat1)
>
> haz<-basehaz(fit2) # gives 611 unique haz$time, time points
>
Regardless of the discrepancy it appears you have over 1-200 variables  
with only 5-600 events.
>
> I get the following warnings() with fit3.1:
> Warning message:
> In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>  Loglik converged before variable   ; beta may be infinite.
>
> Also the coefficients of the variables that the error occurs for are  
> very
> high.

That suggests that the warning should be heeded because you probably  
have numerical stability problems, possibly highly collinear variables  
or complete separation on various strata.

> The Wald test suggests dropping these terms where as the LRT suggests
> keeping them. What should I do in terms of model selection?

I worry that you have already committed many modeling sins. If you  
started out with 260 variables and dropped a bunch of them with  step  
down procedur, then you are currently underestimating the number of  
degrees of freedom that you should be using. My guess is that if you  
used the proper degrees of freedom that the LRT would not support  
keeping them. You have too few data points to support that many  
variables. As Bert Gunter often recommends... get thee to a  
statistician.

> --
> View this message in context: http://r.789695.n4.nabble.com/basehaz-in-package-Survival-and-warnings-with-coxph-tp4639687p4639838.html
> Sent from the R help mailing list archive at Nabble.com.
>
-- 

David Winsemius, MD
Alameda, CA, USA



More information about the R-help mailing list