[R-sig-ME] GLMM- relationship between AICc weight and random effects?
Teresa Oliveira
mteresaoliveira92 at gmail.com
Mon Jul 11 11:53:38 CEST 2016
Regarding SD: yes, I have several individuals with different number of
locations between them. But, for instance, when I consider study areas, I
have a very low variance (usually between 0-0.2, and SD is also 0 or, for
instance, 0.3, when variance is 0.1). Shouldn't I consider study areas as
random effect? I mean, is the "variance" value enough to include/exclude a
random effect?
How should I test for goodness of fit?
Regarding interactions: yes, I may construct models including interactions
between ID and other variables, but first I think it is better to find a
"strong" global model, and find which variables are most important.
Thank you very much for your help!!
Teresa
2016-07-11 2:06 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
>
>
> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
> > Dear list members,
> >
> > I am developing GLMM's in order to assess habitat selection (using GLMMs'
> > coeficients to construct Resource selection functions). I have
> (telemetry)
> > data from 5 study areas, and each area has a different number of
> > individuals monitored.
> >
> > To develop GLMM's, the dependend variable is binary (1-used locations;
> > 0-available locations), and I have a initial set of 14 continuous
> variables
> > (8 land cover variables; 2 distance variables, to artificial areas and
> > water sources; 4 topographic variables): a buffer was placed around each
> > location and the area of each land cover within that buffer was accounted
> > for; distances were measured from each point to the nearest feature, and
> > topographic variables were obtained using DEM rasters. I tested for
> > correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
> > All variables were transformed using z-score.
> >
> > As random effect, I used individual ID. I thought at the beggining to use
> > study area as a random effect but I only had 5 levels and there was
> almost
> > no variance when that random effect was used.
> >
> > I constructed a GLMM with 9 variables (not correlated) and a random
> effect,
> > then used "dredge()" function and "model.avg(dredge)" to sort models by
> AIC
> > values. This was the result (only models of AICc lower than 2
> represented):
> >
> > [1]Call:
> > model.avg(object = dredge.m1.1)
> >
> > Component model call:
> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
> > family =
> > binomial(link = "logit"))
> >
> > Component models:
> > df logLik AICc delta weight
> > 123578 8 -4309.94 8635.89 0.00 0.14
> > 1235789 9 -4309.22 8636.44 0.55 0.10
> > 123789 8 -4310.52 8637.04 1.14 0.08
> > 1235678 9 -4309.75 8637.50 1.61 0.06
> > 12378 7 -4311.78 8637.57 1.67 0.06
> > 1234578 9 -4309.79 8637.58 1.69 0.06
> >
> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
> cover
> > variables, and 9 is a topographic variable. Weights seem to be very low,
> > even if I average all those models as it seems to be common when delta
> > values are low.
>
> Well as far as we can tell from this, variables 4-9 aren't doing much
> (on the other hand, variables 1-3 seem to be in all of the top models
> you've shown us -- although presumably there are a bunch more models
> that are almost like these, and similar in weight, with other
> permutations of [123] + [some combination of 456789] ...)
>
>
> Even with this weights, I constructed GLMMs for each of the
> > combinations, and the results were simmilar for all 6 combinations. Here
> > are the results for the first one (GLMM + overdispersion + r-squared):
> >
> > Random effects:
> > Groups Name Variance Std.Dev.
> > ID.CODE_1 (Intercept) 13.02 3.608
> > Number of obs: 32670, groups: ID.CODE_1, 55
> >
> > Fixed effects:
> > Estimate Std. Error z value Pr(>|z|),
> > (Intercept) -0.54891 0.51174 -1.073 0.283433
> > 3 -0.22232 0.04059 -5.478 4.31e-08 ***
> > 5 -0.05433 0.02837 -1.915 0.055460 .
> > 7 -0.13108 0.02825 -4.640 3.49e-06 ***
> > 8 -0.15864 0.08670 -1.830 0.067287 .
> > 1 0.28438 0.02853 9.968 < 2e-16 ***
> > 2 0.11531 0.03021 3.817 0.000135 ***
> > Residual deviance: 0.256
> > r.squaredGLMM():
> > R2m R2c
> > 0.01063077 0.80039950
> > This is what I get from this analysis:
> >
> > 1) Variance and SD of the random effect seems fine (definitely better
> than
> > the "0" I got when using Study Areas as random effect);
>
> yes -- SD of the random effects is much larger than any of the fixed
> effects, which means that the differences among individuals are large
> (presumably that means you have very different numbers of presences for
> different number of individuals [all individuals sharing a common pool
> of pseudo-absences ???)
> >
> > 2) Estimate values make sense from what I know of the species and the
> > knowledge I have of the study areas;
>
> Good!
> >
> > 3) Overdispersion values seem good, and R-squared values don't seem very
> > good (at least when considering only fixed effects) but, as I read in
> > several places, AIC and r-squared are not always in agreement.
>
> Overdispersion is meaningless for binary data.
> >
> > 4) Weight values seem very low. Does it mean the models are not good?
>
> It means there are many approximately equivalent models. Nothing in
> this output tells you very much about absolute goodness of fit (which is
> tricky for binary data).
> >
> > Then what I did was construct a GLM ("glm()"), so no random effect was
> > used. I used the same set of variables used in [1], and here are the
> > results (only models of AICc lower than 2 represented):
> >
> > [2] Call:
> > model.avg(object = dredge.glm_m1.1)
> >
> > Component model call:
> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link = "logit"),
> > data =
> > All_SA_Used_RP_Area_z)
> >
> > Component models:
> > df logLik AICc delta weight
> > 12345678 9 -9251.85 18521.70 0.00 0.52
> > 123456789 10 -9251.77 18523.54 1.84 0.21
> > 1345678 8 -9253.84 18523.69 1.99 0.19
> >
> > In this case, weight values are higher.
> >
> > Does this mean that it is better not to use a random effect? (I am not
> sure
> > I can compare GLMM with GLM results, correct me if I am doing wrong
> > assumptions)
>
> No. You could do a likelihood ratio test with anova(), but note that
> the AICc values for the glm() fits are 10,000 (!!) units higher than the
> glmer fits.
>
> While it will potentially greatly complicate your life, I think you
> should at least *consider* interactions between your environment
> variables and ID, i.e. allow for the possibility that different
> individuals respond differently to habitat variation.
>
> Ben Bolker
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list