[R-sig-ME] lmer models-confusing results - more information!
Ben Bolker
bolker at ufl.edu
Thu Dec 3 20:43:06 CET 2009
Gwyneth Wilson wrote:
> I have been running lmer models in R, looking at what effects
> reproductive success in Ground Hornbills (a South African Bird). My
> response variable is breeding success and is binomial (0-1) and my
> random effect is group ID. My response variables include rainfall,
> vegetation, group size, year, nests, and proportion of open woodland.
>
>
> I have run numerous models with success but I am confused about what
> the outputs are. When I run my first model with all my variables (all
> additive) then i get a low AIC value with only a few of the variables
> being significant. When i take out the varaibles that are not
> significant then my AIC becomes higher but I have more significant
> variables! When I keep taking out the unsignificant variables, I am
> left with a model that has nests, open woodland, and group size as
> being extremely significant BUT the AIC is high! Why is my AIC value
> increasing when I have fewer varaibles that are all significant and
> seem to be best explaining my data? Do i look at only the AIC when
> choosing the 'best' model or do I look at only the p-values? or both?
> The model with the lowest AIC at the moment has the most variables
> and most are not significant?
This happens a lot when you have correlated variables: although I
don't agree with absolutely everything it says, Zuur et al 2009 is a
good start for looking at this. When you have correlated variables, it's
easy for them collectively to explain a lot of the pattern but
individually not to explain much.
Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data
exploration to avoid common statistical problems. Methods in Ecology and
Evolution. doi: 10.1111/j.2041-210X.2009.00001.x.
In general, you should *either* (1)fit all sensible models and
model-average the results (if you are interested in prediction) or (2)
use the full model to evaluate p-values, test hypotheses etc. (providing
you have _already_ removed correlated variables). In general (although
Murtaugh 2009 provides a counterexample of sorts), you should **not**
select a model and then (afterwards) evaluate the significance of the
parameters in the model ...
Murtaugh, P. A. 2009. Performance of several variable-selection methods
applied to real ecological data. Ecology Letters 12:1061-1068. doi:
10.1111/j.1461-0248.2009.01361.x.
>
> Please help. Any suggestions would be great!!
>
>
>
> Here is some more information and some of my outputs:
>
>
>
> The first model has all my variables included and i get a low AIC
> with only grp.sz and wood being significant:
>
> model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1|grp.id),data=hornbill,family=binomial)
>
>> summary(model1)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +
> wood + (1 | grp.id) Data: hornbill AIC BIC logLik
> deviance 138.5 182.3 -55.26 110.5 Random effects: Groups Name
> Variance Std.Dev. grp.id (Intercept) 1.2913 1.1364 Number of obs:
> 169, groups: grp.id, 23
>
> Fixed effects: Estimate Std. Error z value Pr(>|z|)
> (Intercept) -3.930736 3.672337 -1.070 0.2845
> factor(art.n)1 1.462829 0.903328 1.619 0.1054 factor(yr)2002
> -2.592315 1.764551 -1.469 0.1418 factor(yr)2003 -3.169365
> 1.759981 -1.801 0.0717 . factor(yr)2004 0.702210 1.341524
> 0.523 0.6007 factor(yr)2005 -2.264257 1.722130 -1.315 0.1886
> factor(yr)2006 2.129728 1.270996 1.676 0.0938 .
> factor(yr)2007 -0.579961 1.390345 -0.417 0.6766 factor(yr)2008
> -1.062933 1.640774 -0.648 0.5171 grp.sz 1.882616
> 0.368317 5.111 3.2e-07 *** rain -0.005896
> 0.003561 -1.656 0.0977 . veg -1.993443 1.948738
> -1.023 0.3063 wood 6.832543 3.050573 2.240
> 0.0251 *
>
>
> Then i carry on and remove varaibles i think are not having an
> influence on breeding success like the year, vegetation and rain. And
> i get this:
>
> model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|grp.id),data=hornbill,family=binomial)
>
>> summary(model3)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id) Data:
> hornbill AIC BIC logLik deviance 143.8 159.4 -66.88 133.8
> Random effects: Groups Name Variance Std.Dev. grp.id
> (Intercept) 0.75607 0.86953 Number of obs: 169, groups: grp.id,
> 23
>
> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> -8.6619 1.3528 -6.403 1.52e-10 *** factor(art.n)1 1.5337
> 0.6420 2.389 0.0169 * grp.sz 1.6631 0.2968
> 5.604 2.09e-08 *** wood 3.2177 1.5793 2.037
> 0.0416 *
>
> So all the variables are significant but the AIC value is higher!
>
> I thought that with fewer variables and they are all showing
> significance which means they are influencing breeding success-then
> why is my AIC higher in this model?? Do i only look at the AIC values
> and ignore the p-values? or only look at the p-values??
>
> Thanks!!
>
>
> _________________________________________________________________
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
More information about the R-sig-mixed-models
mailing list