[R-sig-ME] GLMM- relationship between AICc weight and random effects?
Craig DeMars
cdemars at ualberta.ca
Mon Jul 11 04:15:00 CEST 2016
I would add some caution when interpreting the estimated values when using
random-intercept only GLMMs for RSFs. The standard errors for the fixed
effects do not reflect the animal as the sampling unit (see Schielzeth and
Forstmeier 2009). Thus, if your objective is to make inference to the
larger population of animals, the standard errors for the fixed effects are
far too narrow (i.e. overconfident). It is more appropriate to use 2-stage
approaches or GLMMs that use random slopes for the variables of interest.
Random-intercept only models are somewhat abused in the RSF literature in
this regard, in my opinion......
On Sun, Jul 10, 2016 at 7:06 PM, Ben Bolker <bbolker at gmail.com> wrote:
>
>
> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
> > Dear list members,
> >
> > I am developing GLMM's in order to assess habitat selection (using GLMMs'
> > coeficients to construct Resource selection functions). I have
> (telemetry)
> > data from 5 study areas, and each area has a different number of
> > individuals monitored.
> >
> > To develop GLMM's, the dependend variable is binary (1-used locations;
> > 0-available locations), and I have a initial set of 14 continuous
> variables
> > (8 land cover variables; 2 distance variables, to artificial areas and
> > water sources; 4 topographic variables): a buffer was placed around each
> > location and the area of each land cover within that buffer was accounted
> > for; distances were measured from each point to the nearest feature, and
> > topographic variables were obtained using DEM rasters. I tested for
> > correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
> > All variables were transformed using z-score.
> >
> > As random effect, I used individual ID. I thought at the beggining to use
> > study area as a random effect but I only had 5 levels and there was
> almost
> > no variance when that random effect was used.
> >
> > I constructed a GLMM with 9 variables (not correlated) and a random
> effect,
> > then used "dredge()" function and "model.avg(dredge)" to sort models by
> AIC
> > values. This was the result (only models of AICc lower than 2
> represented):
> >
> > [1]Call:
> > model.avg(object = dredge.m1.1)
> >
> > Component model call:
> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
> > family =
> > binomial(link = "logit"))
> >
> > Component models:
> > df logLik AICc delta weight
> > 123578 8 -4309.94 8635.89 0.00 0.14
> > 1235789 9 -4309.22 8636.44 0.55 0.10
> > 123789 8 -4310.52 8637.04 1.14 0.08
> > 1235678 9 -4309.75 8637.50 1.61 0.06
> > 12378 7 -4311.78 8637.57 1.67 0.06
> > 1234578 9 -4309.79 8637.58 1.69 0.06
> >
> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
> cover
> > variables, and 9 is a topographic variable. Weights seem to be very low,
> > even if I average all those models as it seems to be common when delta
> > values are low.
>
> Well as far as we can tell from this, variables 4-9 aren't doing much
> (on the other hand, variables 1-3 seem to be in all of the top models
> you've shown us -- although presumably there are a bunch more models
> that are almost like these, and similar in weight, with other
> permutations of [123] + [some combination of 456789] ...)
>
>
> Even with this weights, I constructed GLMMs for each of the
> > combinations, and the results were simmilar for all 6 combinations. Here
> > are the results for the first one (GLMM + overdispersion + r-squared):
> >
> > Random effects:
> > Groups Name Variance Std.Dev.
> > ID.CODE_1 (Intercept) 13.02 3.608
> > Number of obs: 32670, groups: ID.CODE_1, 55
> >
> > Fixed effects:
> > Estimate Std. Error z value Pr(>|z|),
> > (Intercept) -0.54891 0.51174 -1.073 0.283433
> > 3 -0.22232 0.04059 -5.478 4.31e-08 ***
> > 5 -0.05433 0.02837 -1.915 0.055460 .
> > 7 -0.13108 0.02825 -4.640 3.49e-06 ***
> > 8 -0.15864 0.08670 -1.830 0.067287 .
> > 1 0.28438 0.02853 9.968 < 2e-16 ***
> > 2 0.11531 0.03021 3.817 0.000135 ***
> > Residual deviance: 0.256
> > r.squaredGLMM():
> > R2m R2c
> > 0.01063077 0.80039950
> > This is what I get from this analysis:
> >
> > 1) Variance and SD of the random effect seems fine (definitely better
> than
> > the "0" I got when using Study Areas as random effect);
>
> yes -- SD of the random effects is much larger than any of the fixed
> effects, which means that the differences among individuals are large
> (presumably that means you have very different numbers of presences for
> different number of individuals [all individuals sharing a common pool
> of pseudo-absences ???)
> >
> > 2) Estimate values make sense from what I know of the species and the
> > knowledge I have of the study areas;
>
> Good!
> >
> > 3) Overdispersion values seem good, and R-squared values don't seem very
> > good (at least when considering only fixed effects) but, as I read in
> > several places, AIC and r-squared are not always in agreement.
>
> Overdispersion is meaningless for binary data.
> >
> > 4) Weight values seem very low. Does it mean the models are not good?
>
> It means there are many approximately equivalent models. Nothing in
> this output tells you very much about absolute goodness of fit (which is
> tricky for binary data).
> >
> > Then what I did was construct a GLM ("glm()"), so no random effect was
> > used. I used the same set of variables used in [1], and here are the
> > results (only models of AICc lower than 2 represented):
> >
> > [2] Call:
> > model.avg(object = dredge.glm_m1.1)
> >
> > Component model call:
> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link = "logit"),
> > data =
> > All_SA_Used_RP_Area_z)
> >
> > Component models:
> > df logLik AICc delta weight
> > 12345678 9 -9251.85 18521.70 0.00 0.52
> > 123456789 10 -9251.77 18523.54 1.84 0.21
> > 1345678 8 -9253.84 18523.69 1.99 0.19
> >
> > In this case, weight values are higher.
> >
> > Does this mean that it is better not to use a random effect? (I am not
> sure
> > I can compare GLMM with GLM results, correct me if I am doing wrong
> > assumptions)
>
> No. You could do a likelihood ratio test with anova(), but note that
> the AICc values for the glm() fits are 10,000 (!!) units higher than the
> glmer fits.
>
> While it will potentially greatly complicate your life, I think you
> should at least *consider* interactions between your environment
> variables and ID, i.e. allow for the possibility that different
> individuals respond differently to habitat variation.
>
> Ben Bolker
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
Craig DeMars, Ph.D.
Postdoctoral Fellow
Department of Biological Sciences
University of Alberta
Phone: 780-221-3971
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list