[R-sig-ME] Multi-level qualitative (fixed-effects) factors

Mon Aug 2 18:51:09 CEST 2010

Dear List,

For the analysis of my GLMM i am using AIC values rather than stepwise regression to simplify it. I have developed some candidate models and am running through them now. I know a priori that  there are some important interactions and i  have also removed all the factors i consider unimportant.

I have many multi level factors i.e habit - aquatic, terrestrial, epiphyte etc

I ran the model with habit as a factor 

> model111 <-lmer(threatornot~1+(1|a/b) + habit, family=binomial)

> Generalized linear mixed model fit by the Laplace approximation 
> Formula: threatornot ~ 1 + (1 | order/family) + habit 
>   AIC  BIC logLik deviance
>  1406 1436 -696.9     1394
> Random effects:
>  Groups       Name        Variance   Std.Dev.  
>  family:order (Intercept) 6.9892e-01 8.3602e-01
>  order        (Intercept) 4.2292e-14 2.0565e-07
> Number of obs: 1116, groups: family:order, 43; order, 9
> 
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)   
> (Intercept) -0.04803    0.19174  -0.250  0.80219   
> habit2       1.10627    0.41607   2.659  0.00784 **
> habit3       0.92578    0.78141   1.185  0.23611   
> habit4       0.14383    0.38477   0.374  0.70856

---
Which had a AIC of 1406

I then re-ran the model with only aquatic and got a lower AIC value - which i guess is to be expected as aquatic is highly significant and aquatic species are more prone to threat ( my response).

> > model112 <-lmer(threatornot~1+(1|a/b) + aquatic, family=binomial)
> > model112
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: threatornot ~ 1 + (1 | order/family) + aquatic 
>   AIC  BIC logLik deviance
>  1395 1415 -693.4     1387
> Random effects:
>  Groups       Name        Variance Std.Dev.
>  family:order (Intercept) 0.60007  0.77464 
>  order        (Intercept) 0.00000  0.00000 
> Number of obs: 1116, groups: family:order, 43; order, 9
> 
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)   0.1572     0.1827   0.860 0.389613    
> aquatic      -0.6683     0.1737  -3.847 0.000119 ***

My question is  - when i developed the candidate models i thought using multilevel factors would be OK and i would be able to tease out the individual levels. If i split the factors into levels from the beginning then i am left with a huge amount of candidate models? This would not be a problem in stepwise regression as i could just remove the habit with the least significant P Value.

If i remove habits i "feel" are unimportant from the beginning i feel i would be limiting the model too much.

I hope this makes sense!

Has anyone else had this problem or can see a work around?

Thanks

Peter