[R-sig-ME] lmer models-confusing results - more information!

Thu Dec 3 11:19:31 CET 2009

Dear Gwyneth,

Since you're not getting any answers - I'l give it a go, at the risk  
of being wrong.

The likelihood for non-Gaussian GLMM cannot be obtained in closed  
form  and needs to be approximated. Often the approximation is good,  
but in some cases it can be bad, particularly with binary data when  
the incidence is extreme (low/high) and/or there is little replication  
within factor levels. In extreme cases the parameter estimates +/- the  
2*SE's do not even include the "true" values.

 From your fixed effect summary it appears that reproductive successes  
within some factor levels are all zero. If this is the case, this may  
well be what is causing the problem and treating year as a random  
effect may help. MCMC solutions are probably more robust for these  
types of data because they use approximations which get more exact the  
longer you run the analysis.

With regards to an earlier  email, over-dispersed binary data does not  
occur, because the mean determines the variance completely. This does  
not mean that the probability of success is constant (after  
conditioning on the model), it just means that any heterogeneity  
cannot be observed and therefore estimated. In short, you don't need  
to worry about it.

Cheers,

Jarrod

On 3 Dec 2009, at 06:33, Gwyneth Wilson wrote:

>
> I have been running lmer models in R, looking at what effects  
> reproductive success in Ground Hornbills (a South African Bird). My  
> response variable is breeding success and is binomial (0-1) and my  
> random effect is group ID. My response variables include rainfall,  
> vegetation, group size, year, nests, and proportion of open woodland.
>
> I have run numerous models with success but I am confused about what  
> the outputs are. When I run my first model with all my variables  
> (all additive) then i get a low AIC value with only a few of the  
> variables being significant. When i take out the varaibles that are  
> not significant then my AIC becomes higher but I have more  
> significant variables! When I keep taking out the unsignificant  
> variables, I am left with a model that has nests, open woodland, and  
> group size as being extremely significant BUT the AIC is high! Why  
> is my AIC value increasing when I have fewer varaibles that are all  
> significant and seem to be best explaining my data? Do i look at  
> only the AIC when choosing the 'best' model or do I look at only the  
> p-values? or both? The model with the lowest AIC at the moment has  
> the most variables and most are not significant?
>
> Please help. Any suggestions would be great!!
>
>
>
> Here is some more information and some of my outputs:
>
>
>
> The first model has all my variables included and i get a low AIC  
> with only grp.sz and wood being significant:
>
>
>
> model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1| 
> grp.id),data=hornbill,family=binomial)
>> summary(model1)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +  
> wood +      (1 | grp.id)
>   Data: hornbill
>   AIC   BIC    logLik   deviance
> 138.5 182.3  -55.26    110.5
> Random effects:
> Groups Name   Variance Std.Dev.
> grp.id (Intercept) 1.2913   1.1364
> Number of obs: 169, groups: grp.id, 23
>
> Fixed effects:
>                  Estimate      Std. Error  z value  Pr(>|z|)
> (Intercept)      -3.930736   3.672337  -1.070    0.2845
> factor(art.n)1  1.462829   0.903328   1.619     0.1054
> factor(yr)2002 -2.592315   1.764551  -1.469   0.1418
> factor(yr)2003 -3.169365   1.759981  -1.801   0.0717 .
> factor(yr)2004  0.702210   1.341524   0.523   0.6007
> factor(yr)2005 -2.264257   1.722130  -1.315   0.1886
> factor(yr)2006  2.129728   1.270996   1.676   0.0938 .
> factor(yr)2007 -0.579961   1.390345  -0.417   0.6766
> factor(yr)2008 -1.062933   1.640774  -0.648   0.5171
> grp.sz             1.882616    0.368317   5.111   3.2e-07 ***
> rain                -0.005896   0.003561  -1.656   0.0977 .
> veg                 -1.993443   1.948738  -1.023   0.3063
> wood               6.832543   3.050573   2.240   0.0251 *
>
>
> Then i carry on and remove varaibles i think are not having an  
> influence on breeding success like the year, vegetation and rain.  
> And i get this:
>
> model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1| 
> grp.id),data=hornbill,family=binomial)
>> summary(model3)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id)
>   Data: hornbill
>   AIC    BIC    logLik deviance
> 143.8  159.4  -66.88    133.8
> Random effects:
> Groups Name        Variance Std.Dev.
> grp.id (Intercept)     0.75607  0.86953
> Number of obs: 169, groups: grp.id, 23
>
> Fixed effects:
>                   Estimate Std. Error  z value   Pr(>|z|)
> (Intercept)      -8.6619     1.3528   -6.403   1.52e-10 ***
> factor(art.n)1   1.5337     0.6420    2.389    0.0169 *
> grp.sz            1.6631     0.2968    5.604    2.09e-08 ***
> wood              3.2177     1.5793    2.037   0.0416 *
>
> So all the variables are significant but the AIC value is higher!
>
> I thought that with fewer variables and they are all showing  
> significance which means they are influencing breeding success-then  
> why is my AIC higher in this model??
> Do i only look at the AIC values and ignore the p-values? or only  
> look at the p-values??
>
> Thanks!!
>
>
> 		 	   		
> _________________________________________________________________
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.