[R] covariate selection in cox model (counting process)
Thomas Lumley
tlumley at u.washington.edu
Wed Jul 28 19:42:01 CEST 2004
On Wed, 28 Jul 2004, Mayeul KAUFFMANN wrote:
> > If you can get the conditional independence (martingaleness) then, yes,
> > BIC is fine.
> >
> > One way to check might be to see how similar the standard errors are
> with
> > and without the cluster(id) term.
>
> (Thank you "again !", Thomas.)
>
> At first look, the values seemed very similar (see below, case 2).
> However, to check this without being too subjective, and without a
> specific test, I needed other values to assess the size of the
> differences: what is similar, what is not?
>
I think the econometricians have theory for this (comparing the whole
covariance matrices).
-thomas
>
> ==========================================================================
> =====
> CASE 1
> I first estimated the model without modeling dependence:
>
> Call:
> coxph(formula = Surv(start, stop, status) ~ cluster(ccode) +
> pop + pib + pib2 + crois + instab.x1 + instab.autres, data = xstep)
>
>
> coef exp(coef) se(coef) robust se z p
> pop 0.3606 1.434 0.0978 0.1182 3.05 2.3e-03
> pib -0.5947 0.552 0.1952 0.1828 -3.25 1.1e-03
> pib2 -0.4104 0.663 0.1452 0.1270 -3.23 1.2e-03
> crois -0.0592 0.943 0.0245 0.0240 -2.46 1.4e-02
> instab.x1 2.2059 9.079 0.4692 0.4097 5.38 7.3e-08
> instab.autres 0.9550 2.599 0.4700 0.4936 1.93 5.3e-02
>
> Likelihood ratio test=74 on 6 df, p=6.2e-14 n= 7286
>
> There seems to be a strong linear relationship between standard errors
> (se, or naive se) and robust se.
>
> > summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1))
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> sqrt(diag(cox1$naive.var)) 0.96103 0.04064 23.65 2.52e-06 ***
> Multiple R-Squared: 0.9911, Adjusted R-squared: 0.9894
>
>
> ==========================================================================
> =====
> CASE 2
>
> Then I added a variable (pxcw) measuring the proximity of the previous
> event (1>pxcw>0)
>
> n= 7286
> coef exp(coef) se(coef) robust se z p
> pxcw 0.9063 2.475 0.4267 0.4349 2.08 3.7e-02
> pop 0.3001 1.350 0.1041 0.1295 2.32 2.0e-02
> pib -0.5485 0.578 0.2014 0.1799 -3.05 2.3e-03
> pib2 -0.4033 0.668 0.1450 0.1152 -3.50 4.6e-04
> crois -0.0541 0.947 0.0236 0.0227 -2.38 1.7e-02
> instab.x1 1.9649 7.134 0.4839 0.4753 4.13 3.6e-05
> instab.autres 0.8498 2.339 0.4693 0.4594 1.85 6.4e-02
>
> Likelihood ratio test=78.3 on 7 df, p=3.04e-14 n= 7286
>
>
> Estimate Std. Error t value Pr(>|t|)
> sqrt(diag(cox1$naive.var)) 0.98397 0.02199 44.74 8.35e-09 ***
> Multiple R-Squared: 0.997, Adjusted R-squared: 0.9965
>
> The naive standard errors (se) seem closer to the robust se than they were
> when not modeling for dependence.
> 0.98397 is very close to one, R^2 grew, etc.
> The dependence is high (risk is multiplied by 2.475 the day after an
> event)
> but conditional independence (given covariates) seems hard to reject.
>
>
> ==========================================================================
> =====
> CASE 3
> Finally, I compared these results with those without repeated events
> (which gives a smaller dataset). A country is removed as soon as we
> observe its first event.
> (robust se is still computed, even if naive se should in fact be used here
> to compute the pvalue)
>
> coxph(formula = Surv(start, stop, status) ~ cluster(ccode) +
> pop + pib + pib2 + crois + instab.x1 + instab.autres, data =
> xstep[no.previous.event, ])
>
> coef exp(coef) se(coef) robust se z p
> pop 0.4236 1.528 0.1030 0.1157 3.66 2.5e-04
> pib -0.7821 0.457 0.2072 0.1931 -4.05 5.1e-05
> pib2 -0.3069 0.736 0.1477 0.1254 -2.45 1.4e-02
> crois -0.0432 0.958 0.0281 0.0258 -1.67 9.5e-02
> instab.x1 1.9925 7.334 0.5321 0.3578 5.57 2.6e-08
> instab.autres 1.3571 3.885 0.5428 0.5623 2.41 1.6e-02
>
> Likelihood ratio test=66.7 on 6 df, p=1.99e-12 n=5971 (2466 observations
> deleted due to missing)
>
>
> > summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1))
> Estimate Std. Error t value Pr(>|t|)
> sqrt(diag(cox1$naive.var)) 0.86682 0.07826 11.08 0.000104 ***
> Residual standard error: 0.06328 on 5 degrees of freedom
> Multiple R-Squared: 0.9608, Adjusted R-squared: 0.953
>
>
> There seems to be no evidence that robust se is more different from se in
> case 2 than in case 3 (and case 1).
> It even seems closer.
>
> I conclude that conditional independence (martingaleness) cannot be
> rejected in CASE 2, when modeling the dependence between events with a
> covariate.
>
> Mayeul KAUFFMANN
> Univ. Pierre Mendes France
> Grenoble - France
>
>
>
> > > Then, there is still another option. In fact, I already modelled
> > > explicitely the influence of past events with a "proximity of last
> event"
> > > covariate, assuming the dependence on the last event decreases at a
> > > constant rate (for instance, the proximity covariate varies from 1 to
> 0.5
> > > in the first 10 years after an event, then from 0.5 to 0.25 in the
> next
> > > ten years, etc).
> > >
> > > With a well chosen modelisation of the dependence effect, the events
> > > become conditionnaly independent, I do not need a +cluster(id) term,
> and I
> > > can use fit$loglik to make a covariate selection based on BIC, right?
> >
> > If you can get the conditional independence (martingaleness) then, yes,
> > BIC is fine.
> >
> > One way to check might be to see how similar the standard errors are
> with
> > and without the cluster(id) term.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list