[R-sig-ME] GLMM- relationship between AICc weight and random effects?

Mon Jul 11 03:06:17 CEST 2016

On 16-07-09 07:20 PM, Teresa Oliveira wrote:
> Dear list members,
> 
> I am developing GLMM's in order to assess habitat selection (using GLMMs'
> coeficients to construct Resource selection functions). I have (telemetry)
> data from 5 study areas, and each area has a different number of
> individuals monitored.
> 
> To develop GLMM's, the dependend variable is binary (1-used locations;
> 0-available locations), and I have a initial set of 14 continuous variables
> (8 land cover variables; 2 distance variables, to artificial areas and
> water sources; 4 topographic variables): a buffer was placed around each
> location and the area of each land cover within that buffer was accounted
> for; distances were measured from each point to the nearest feature, and
> topographic variables were obtained using DEM rasters. I tested for
> correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
> All variables were transformed using z-score.
> 
> As random effect, I used individual ID. I thought at the beggining to use
> study area as a random effect but I only had 5 levels and there was almost
> no variance when that random effect was used.
> 
> I constructed a GLMM with 9 variables (not correlated) and a random effect,
> then used "dredge()" function and "model.avg(dredge)" to sort models by AIC
> values. This was the result (only models of AICc lower than 2 represented):
> 
> [1]Call:
> model.avg(object = dredge.m1.1)
> 
> Component model call:
> glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
> family =
>      binomial(link = "logit"))
> 
> Component models:
>           df   logLik    AICc  delta weight
> 123578     8 -4309.94 8635.89   0.00   0.14
> 1235789    9 -4309.22 8636.44   0.55   0.10
> 123789     8 -4310.52 8637.04   1.14   0.08
> 1235678    9 -4309.75 8637.50   1.61   0.06
> 12378      7 -4311.78 8637.57   1.67   0.06
> 1234578    9 -4309.79 8637.58   1.69   0.06
> 
> Variables 1 and 2 represent the distance variables; from 3 to 8 land cover
> variables, and 9 is a topographic variable. Weights seem to be very low,
> even if I average all those models as it seems to be common when delta
> values are low. 

Well as far as we can tell from this, variables 4-9 aren't doing much
(on the other hand, variables 1-3 seem to be in all of the top models
you've shown us -- although presumably there are a bunch more models
that are almost like these, and similar in weight, with other
permutations of [123] + [some combination of 456789] ...)

Even with this weights, I constructed GLMMs for each of the
> combinations, and the results were simmilar for all 6 combinations. Here
> are the results for the first one (GLMM + overdispersion + r-squared):
> 
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  ID.CODE_1 (Intercept) 13.02    3.608
> Number of obs: 32670, groups:  ID.CODE_1, 55
> 
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|),
> (Intercept) -0.54891    0.51174  -1.073 0.283433
> 3       -0.22232    0.04059  -5.478 4.31e-08 ***
> 5       -0.05433    0.02837  -1.915 0.055460 .
> 7       -0.13108    0.02825  -4.640 3.49e-06 ***
> 8       -0.15864    0.08670  -1.830 0.067287 .
> 1         0.28438    0.02853   9.968  < 2e-16 ***
> 2         0.11531    0.03021   3.817 0.000135 ***
> Residual deviance: 0.256
> r.squaredGLMM():
>        R2m        R2c
> 0.01063077 0.80039950
> This is what I get from this analysis:
> 
> 1) Variance and SD of the random effect seems fine (definitely better than
> the "0" I got when using Study Areas as random effect);

  yes -- SD of the random effects is much larger than any of the fixed
effects, which means that the differences among individuals are large
(presumably that means you have very different numbers of presences for
different number of individuals [all individuals sharing a common pool
of pseudo-absences ???)
> 
> 2) Estimate values make sense from what I know of the species and the
> knowledge I have of the study areas;

  Good!
> 
> 3) Overdispersion values seem good, and R-squared values don't seem very
> good (at least when considering only fixed effects) but, as I read in
> several places, AIC and r-squared are not always in agreement.

  Overdispersion is meaningless for binary data.
> 
> 4) Weight values seem very low. Does it mean the models are not good?

  It means there are many approximately equivalent models.  Nothing in
this output tells you very much about absolute goodness of fit (which is
tricky for binary data).
> 
> Then what I did was construct a GLM ("glm()"), so no random effect was
> used. I used the same set of variables used in [1], and here are the
> results (only models of AICc lower than 2 represented):
> 
> [2] Call:
> model.avg(object = dredge.glm_m1.1)
> 
> Component model call:
> glm(formula = Used ~ <512 unique rhs>, family = binomial(link = "logit"),
> data =
>      All_SA_Used_RP_Area_z)
> 
> Component models:
>           df   logLik     AICc   delta weight
> 12345678   9 -9251.85 18521.70    0.00   0.52
> 123456789 10 -9251.77 18523.54    1.84   0.21
> 1345678    8 -9253.84 18523.69    1.99   0.19
> 
> In this case, weight values are higher.
> 
> Does this mean that it is better not to use a random effect? (I am not sure
> I can compare GLMM with GLM results, correct me if I am doing wrong
> assumptions)

  No.  You could do a likelihood ratio test with anova(), but note that
the AICc values for the glm() fits are 10,000 (!!) units higher than the
glmer fits.

  While it will potentially greatly complicate your life, I think you
should at least *consider* interactions between your environment
variables and ID, i.e. allow for the possibility that different
individuals respond differently to habitat variation.

  Ben Bolker