[R-sig-ME] GLMM- relationship between AICc weight and random effects?
Teresa Oliveira
mteresaoliveira92 at gmail.com
Mon Jul 11 12:01:17 CEST 2016
So, you are saying that the estimate values I get for the fixed effects,
and SD, do not have in account the random intercept (in this case, ID)?
What do you mean by "2-stage approaches"?
Regarding random slopes, are they represented by, for instance, (Variable
1|ID)?
I have seen a lot of studies where everyone uses GLMM to construct RSF, but
I have also read that they are not appropriate for RSF, however no one
seems sure about the right approach...
Thank you very much for your help!!
Teresa
2016-07-11 3:15 GMT+01:00 Craig DeMars <cdemars at ualberta.ca>:
> I would add some caution when interpreting the estimated values when using
> random-intercept only GLMMs for RSFs. The standard errors for the fixed
> effects do not reflect the animal as the sampling unit (see Schielzeth and
> Forstmeier 2009). Thus, if your objective is to make inference to the
> larger population of animals, the standard errors for the fixed effects are
> far too narrow (i.e. overconfident). It is more appropriate to use 2-stage
> approaches or GLMMs that use random slopes for the variables of interest.
>
> Random-intercept only models are somewhat abused in the RSF literature in
> this regard, in my opinion......
>
> On Sun, Jul 10, 2016 at 7:06 PM, Ben Bolker <bbolker at gmail.com> wrote:
>
>>
>>
>> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
>> > Dear list members,
>> >
>> > I am developing GLMM's in order to assess habitat selection (using
>> GLMMs'
>> > coeficients to construct Resource selection functions). I have
>> (telemetry)
>> > data from 5 study areas, and each area has a different number of
>> > individuals monitored.
>> >
>> > To develop GLMM's, the dependend variable is binary (1-used locations;
>> > 0-available locations), and I have a initial set of 14 continuous
>> variables
>> > (8 land cover variables; 2 distance variables, to artificial areas and
>> > water sources; 4 topographic variables): a buffer was placed around each
>> > location and the area of each land cover within that buffer was
>> accounted
>> > for; distances were measured from each point to the nearest feature, and
>> > topographic variables were obtained using DEM rasters. I tested for
>> > correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
>> > All variables were transformed using z-score.
>> >
>> > As random effect, I used individual ID. I thought at the beggining to
>> use
>> > study area as a random effect but I only had 5 levels and there was
>> almost
>> > no variance when that random effect was used.
>> >
>> > I constructed a GLMM with 9 variables (not correlated) and a random
>> effect,
>> > then used "dredge()" function and "model.avg(dredge)" to sort models by
>> AIC
>> > values. This was the result (only models of AICc lower than 2
>> represented):
>> >
>> > [1]Call:
>> > model.avg(object = dredge.m1.1)
>> >
>> > Component model call:
>> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
>> > family =
>> > binomial(link = "logit"))
>> >
>> > Component models:
>> > df logLik AICc delta weight
>> > 123578 8 -4309.94 8635.89 0.00 0.14
>> > 1235789 9 -4309.22 8636.44 0.55 0.10
>> > 123789 8 -4310.52 8637.04 1.14 0.08
>> > 1235678 9 -4309.75 8637.50 1.61 0.06
>> > 12378 7 -4311.78 8637.57 1.67 0.06
>> > 1234578 9 -4309.79 8637.58 1.69 0.06
>> >
>> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
>> cover
>> > variables, and 9 is a topographic variable. Weights seem to be very low,
>> > even if I average all those models as it seems to be common when delta
>> > values are low.
>>
>> Well as far as we can tell from this, variables 4-9 aren't doing much
>> (on the other hand, variables 1-3 seem to be in all of the top models
>> you've shown us -- although presumably there are a bunch more models
>> that are almost like these, and similar in weight, with other
>> permutations of [123] + [some combination of 456789] ...)
>>
>>
>> Even with this weights, I constructed GLMMs for each of the
>> > combinations, and the results were simmilar for all 6 combinations. Here
>> > are the results for the first one (GLMM + overdispersion + r-squared):
>> >
>> > Random effects:
>> > Groups Name Variance Std.Dev.
>> > ID.CODE_1 (Intercept) 13.02 3.608
>> > Number of obs: 32670, groups: ID.CODE_1, 55
>> >
>> > Fixed effects:
>> > Estimate Std. Error z value Pr(>|z|),
>> > (Intercept) -0.54891 0.51174 -1.073 0.283433
>> > 3 -0.22232 0.04059 -5.478 4.31e-08 ***
>> > 5 -0.05433 0.02837 -1.915 0.055460 .
>> > 7 -0.13108 0.02825 -4.640 3.49e-06 ***
>> > 8 -0.15864 0.08670 -1.830 0.067287 .
>> > 1 0.28438 0.02853 9.968 < 2e-16 ***
>> > 2 0.11531 0.03021 3.817 0.000135 ***
>> > Residual deviance: 0.256
>> > r.squaredGLMM():
>> > R2m R2c
>> > 0.01063077 0.80039950
>> > This is what I get from this analysis:
>> >
>> > 1) Variance and SD of the random effect seems fine (definitely better
>> than
>> > the "0" I got when using Study Areas as random effect);
>>
>> yes -- SD of the random effects is much larger than any of the fixed
>> effects, which means that the differences among individuals are large
>> (presumably that means you have very different numbers of presences for
>> different number of individuals [all individuals sharing a common pool
>> of pseudo-absences ???)
>> >
>> > 2) Estimate values make sense from what I know of the species and the
>> > knowledge I have of the study areas;
>>
>> Good!
>> >
>> > 3) Overdispersion values seem good, and R-squared values don't seem very
>> > good (at least when considering only fixed effects) but, as I read in
>> > several places, AIC and r-squared are not always in agreement.
>>
>> Overdispersion is meaningless for binary data.
>> >
>> > 4) Weight values seem very low. Does it mean the models are not good?
>>
>> It means there are many approximately equivalent models. Nothing in
>> this output tells you very much about absolute goodness of fit (which is
>> tricky for binary data).
>> >
>> > Then what I did was construct a GLM ("glm()"), so no random effect was
>> > used. I used the same set of variables used in [1], and here are the
>> > results (only models of AICc lower than 2 represented):
>> >
>> > [2] Call:
>> > model.avg(object = dredge.glm_m1.1)
>> >
>> > Component model call:
>> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link =
>> "logit"),
>> > data =
>> > All_SA_Used_RP_Area_z)
>> >
>> > Component models:
>> > df logLik AICc delta weight
>> > 12345678 9 -9251.85 18521.70 0.00 0.52
>> > 123456789 10 -9251.77 18523.54 1.84 0.21
>> > 1345678 8 -9253.84 18523.69 1.99 0.19
>> >
>> > In this case, weight values are higher.
>> >
>> > Does this mean that it is better not to use a random effect? (I am not
>> sure
>> > I can compare GLMM with GLM results, correct me if I am doing wrong
>> > assumptions)
>>
>> No. You could do a likelihood ratio test with anova(), but note that
>> the AICc values for the glm() fits are 10,000 (!!) units higher than the
>> glmer fits.
>>
>> While it will potentially greatly complicate your life, I think you
>> should at least *consider* interactions between your environment
>> variables and ID, i.e. allow for the possibility that different
>> individuals respond differently to habitat variation.
>>
>> Ben Bolker
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
> --
> Craig DeMars, Ph.D.
> Postdoctoral Fellow
> Department of Biological Sciences
> University of Alberta
> Phone: 780-221-3971
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list