[R] Question regarding significance of a covariate in a coxme survival model
Teresa Iglesias
tliglesias at ucdavis.edu
Fri Aug 27 22:32:06 CEST 2010
Christopher David Desjardins <desja004 <at> umn.edu> writes:
>
> Hi,
> I am running a Cox Mixed Effects Hazard model using the library coxme. I
> am trying to model time to onset (age_sym1) of thought problems (e.g.
> hearing voices) (sym1). As I have siblings in my dataset, I have
> decided to account for this by including a random effect for family
> (famid). My covariate of interest is Mother's diagnosis where a 0 is
> bipolar, 1 is control, and 2 is major depression. I am trying to fit
> the following model.
>
> thorp1 <- coxme(Surv(age_sym1, sym1) ~ lifedxm + (1 | famid), data =
> bip.surv)
>
> Which provides the following output:
>
> -------------------------------------------------
> > thorp1
> Cox mixed-effects model fit by maximum likelihood
> Data: bip.surv
> events, n = 99, 189 (3 observations deleted due to missingness)
> Iterations= 10 54
> NULL Integrated Penalized
> Log-likelihood -479.0372 -467.3549 -435.2096
> Chisq df p AIC BIC
> Integrated loglik 23.36 3.00 3.3897e-05 17.36 9.58
> Penalized loglik 87.66 47.27 3.2374e-04 -6.88 -129.54
> Model: Surv(age_sym1, sym1) ~ lifedxm + (1 | famid)
> Fixed coefficients
> coef exp(coef) se(coef) z p
> lifedxmCONTROL -1.0576257 0.3472794 0.3872527 -2.73 0.0063
> lifedxmMAJOR -0.6329957 0.5309987 0.3460847 -1.83 0.0670
> Random effects
> Group Variable Std Dev Variance
> famid Intercept 0.9877770 0.9757033
>
> --------------------------------------------------
>
> So it appears that there is a difference in hazard rates associated with
> Mother's diagnosis between BIPOLAR and CONTROL. Just to be safe, I fit a
> model without this covariate present and decided to compare the models
> with AIC and see if fit decreased with this covariate not in the model.
>
> ----------------------------------------------------------
> thorp1.n <- coxme(Surv(age_sym1, sym1) ~ (1 | famid), data = bip.surv)
> > thorp1.n
> Cox mixed-effects model fit by maximum likelihood
> Data: bip.surv
> events, n = 99, 189 (3 observations deleted due to missingness)
> Iterations= 10 54
> NULL Integrated Penalized
> Log-likelihood -479.0372 -471.3333 -436.0478
> Chisq df p AIC BIC
> Integrated loglik 15.41 1.00 0.00008663 13.41 10.81
> Penalized loglik 85.98 49.48 0.00099915 -12.97 -141.37
> Model: Surv(age_sym1, sym1) ~ (1 | famid)
> Random effects
> Group Variable Std Dev Variance
> famid Intercept 1.050949 1.104495
> ------------------------------------------------------------
>
> The problem is that the AIC for the model w/o lifedxm is 13.41 and the
> model w/ lifedxm is 17.36. So fit actually improved without lifedxm in
> the model but really the fit is indistinguishable if I use Burnham &
> Anderson, 2002's criteria.
>
> So my conundrum is this - The AIC and the p-values appear to disagree.
> The p-value would suggest that lifedxm should be included in the model
> and the AIC says it doesn't improve fit. In general, I tend to favor the
> AIC (I usually work with large sample sizes and multilevel models) but I
> am just beginning with survival models and I am curious if a survival
> model guru might suggest whether lifedxm needs to be in the model or not
> based on my results here? Would people generally use the AIC in this
> situation? Also, I can't run anova() on this models because of the
> random effect.
>
> I am happy to provide the data if necessary.
>
> Please cc me on a reply.
>
> Thanks,
> Chris
>
>
Hi Chris,
Did you get an answer to why the p-value and AIC score disagreed?
I am getting the same results with my own data. They're not small
disagreements either. The AIC score is telling me the opposite of
what the p-value and the parameter coef are saying.
The p-value and the coef for the predictor variable are in agreement.
I've also noticed in some published papers with tables containing
p-values and AIC scores that the chisq p-value and AIC score
are in opposition too. This makes me think I'm missing something
and that this all actually makes sense.
However given that AIC = − 2 (log L) + 2K (where K is the number of parameters)
and seeing as how the log-likelihood scores below are in the hundreds, shouldn't
the AIC score be in the hundreds as well??
--------------------------------------
model0 (intercept only model)
NULL Integrated Penalized
Log-likelihood -119.8470 -117.9749 -113.1215
Chisq df p AIC BIC
Integrated loglik 3.74 1.00 0.052989 1.74 0.08
Penalized loglik 13.45 7.06 0.063794 -0.68 -12.43
Random effects
Group Variable Std Dev Variance
Site Intercept 0.6399200 0.4094977
_____
model1 (before vs after)
NULL Integrated Penalized
Log-likelihood -119.8470 -112.1598 -108.1663
Chisq df p AIC BIC
Integrated loglik 15.37 2.00 0.00045863 11.37 8.05
Penalized loglik 23.36 7.06 0.00153710 9.25 -2.49
Fixed coefficients
Coef exp(coef) se(coef) z p
Time -1.329997 0.2644779 0.4009229 -3.32 0.00091
Random effects
Group Variable Std Dev Variance
Site Intercept 0.5625206 0.3164294
______
model2 (stim1 vs stim2)
NULL Integrated Penalized
Log-likelihood -119.8470 -117.8677 -113.167
Chisq df p AIC BIC
Integrated loglik 3.96 2.00 0.138160 -0.04 -3.37
Penalized loglik 13.36 7.83 0.093314 -2.31 -15.34
Fixed coefficients
coef exp(coef) se(coef) z p
stim 0.1621367 1.176021 0.3534788 0.46 0.65
Random effects
Group Variable Std Dev Variance
Site Intercept 0.6262600 0.3922016
----------------------------------------------
If you didn't get an answer, hopefully, someone that understands all this will
reply for both of us.
Thanks,
-Teresa
More information about the R-help
mailing list