[R-sig-ME] GLMM- relationship between AICc weight and random effects?

Teresa Oliveira mteresaoliveira92 at gmail.com
Mon Jul 11 21:07:12 CEST 2016


Thank you very much for your help! It makes more sense now why I get low
variance levels (which would be nice if they were "true")..

So, it doesn't matter which random effect I use, it will always consider
the number of locations?

A related problem, regarding random effects:
The random effects I want to consider and the results I get are confusing
me. For instance, at first, I wanted to include study area as a random
effect (RE), because I thought it would be adequate to consider possible
differences between study areas. However, I got variance values very low
and high SD.
The other two RE I wanted to consider were related with individual ID: the
differences were in the way "availability" was defined for each. One way
assumed that 1) all study area was available for each animal (thus a set of
"available" units was randomly selected for each animal, which I think is
the most common way); the other way assumed that 2) there were zones
(within home ranges) not available for all animals. From these two ways, I
considered only one at the time.
When I consider study area and individual ID, assuming 1), variance is 0 or
very low. When I consider individual ID, assuming 2), variance is higher
and SD is small.

For instance, here is a result I got for a given model (with the higher AIC
weight), using study area as a RE:

"AIC      BIC   logLik deviance df.resid
 18928.6  18995.8  -9456.3  18912.6    32662

Random effects:
 Groups     Name        Variance Std.Dev.
 Study.Area (Intercept) 0.3208   0.5664
Number of obs: 32670, groups:  Study.Area, 5

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.79438    0.25637 -10.900  < 2e-16 ***
LC2_z       -0.60546    0.02487 -24.348  < 2e-16 ***
LC3_z       -0.11825    0.03205  -3.690 0.000224 ***
LC8_z        0.06528    0.01908   3.422 0.000622 ***
DH_z         0.30449    0.02236  13.620  < 2e-16 ***
DW_z         0.04872    0.02089   2.332 0.019695 *
TPI_z        0.02082    0.01745   1.193 0.232845         "


Altough someone advised me not to include study area as RE, because there
were only five levels.
This model has a high AIC (when considering the other models), variance is
low and SD is high but SD for the estimates is low, when looking at the
intercept.

Another example, this time considering ID (1) as a RE:

"AIC      BIC   logLik deviance df.resid
 18978.7  19045.8  -9481.3  18962.7    32662

Random effects:
 Groups Name        Variance Std.Dev.
 ID_1   (Intercept) 0.165    0.4063
Number of obs: 32670, groups:  ID_1, 26

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.60211    0.08917 -29.180  < 2e-16 ***
LC2_z       -0.60123    0.02483 -24.210  < 2e-16 ***
LC3_z       -0.12135    0.03205  -3.786 0.000153 ***
LC8_z        0.05541    0.01915   2.893 0.003817 **
DH_z         0.29363    0.02296  12.789  < 2e-16 ***
DW_z         0.05232    0.02103   2.487 0.012875 *

TPI_z        0.02117    0.01746   1.213 0.225280        "

This result is simmilar to the first one.

Another and last example, this time considering ID (2) as a RE:

"  AIC      BIC   logLik deviance df.resid
  8641.4   8708.6  -4312.7   8625.4    32662

Random effects:
 Groups    Name        Variance Std.Dev.
 ID.CODE_1 (Intercept) 13.09    3.618
Number of obs: 32670, groups:  ID.CODE_1, 55

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.53513    0.51271  -1.044 0.296615
LC2_z       -0.19914    0.03782  -5.265 1.40e-07 ***
LC3_z       -0.02111    0.04195  -0.503 0.614810
LC8_z       -0.12183    0.02767  -4.403 1.07e-05 ***
DH_z         0.28389    0.02853   9.951  < 2e-16 ***
DW_z         0.11496    0.03021   3.806 0.000141 ***
TPI_z       -0.04058    0.02574  -1.576 0.114972              "


This time, AIC is lower and variance higher, but SD of the interept
estimate is very high!

Based on this, which one should be more useful? How can I interpret the
"good" variance/SD of the random effects but with "bad" SD intercept
estimate? And the reverse.
And honestly I think it is very weird that study areas are not different,
and I don't know if I should considerd it anyway or not.


2016-07-11 17:40 GMT+01:00 Craig DeMars <cdemars at ualberta.ca>:

> In an telemetry-based RSF using a random-intercept only GLMM, the standard
> errors are calculated assuming the individual telemetry location (or
> random/available location) is the sampling unit, not the animal. That is
> partially the reason that some studies using this approach report
> incredibly small p-values for their fixed effects when they have only, say,
> a sample of < 30 animals.  These results are highly unlikely given the
> small sample size of actual animals and that most wildlife populations will
> have a degree of variability in selection among individuals. The
> random-intercept only approach does not appropriately reflect this
> individual-level variation.  A random-slope model will calculate slopes at
> the individual animal level and therefore standard errors in this approach
> will reflect individual-level variation. The drawback to these models is
> you are often limited in the number of variables that you can specify as
> random slopes.
>
> A 2-stage approach is where you fit a GLM for each individual animal.
> Population inferences are gained by averaging parameter estimates across
> individuals.  You can account for differences in sample sizes per
> individual by calculating averages that are weighted by sample size or the
> inverse of the variance (giving more weight to individuals with more
> precise estimates). I can guarantee that the resulting standard errors will
> be more conservative (and more appropriate) than those calculated from a
> random-intercept GLMM.
>
> On Mon, Jul 11, 2016 at 4:01 AM, Teresa Oliveira <
> mteresaoliveira92 at gmail.com> wrote:
>
>> So, you are saying that the estimate values I get for the fixed effects,
>> and SD, do not have in account the random intercept (in this case, ID)?
>>
>> What do you mean by "2-stage approaches"?
>> Regarding random slopes, are they represented by, for instance, (Variable
>> 1|ID)?
>>
>> I have seen a lot of studies where everyone uses GLMM to construct RSF,
>> but I have also read that they are not appropriate for RSF, however no one
>> seems sure about the right approach...
>>
>> Thank you very much for your help!!
>> Teresa
>>
>> 2016-07-11 3:15 GMT+01:00 Craig DeMars <cdemars at ualberta.ca>:
>>
>>> I would add some caution when interpreting the estimated values when
>>> using random-intercept only GLMMs for RSFs. The standard errors for the
>>> fixed effects do not reflect the animal as the sampling unit (see
>>> Schielzeth and Forstmeier 2009).  Thus, if your objective is to make
>>> inference to the larger population of animals, the standard errors for the
>>> fixed effects are far too narrow (i.e. overconfident). It is more
>>> appropriate to use 2-stage approaches or GLMMs that use random slopes for
>>> the variables of interest.
>>>
>>> Random-intercept only models are somewhat abused in the RSF literature
>>> in this regard, in my opinion......
>>>
>>> On Sun, Jul 10, 2016 at 7:06 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
>>>> > Dear list members,
>>>> >
>>>> > I am developing GLMM's in order to assess habitat selection (using
>>>> GLMMs'
>>>> > coeficients to construct Resource selection functions). I have
>>>> (telemetry)
>>>> > data from 5 study areas, and each area has a different number of
>>>> > individuals monitored.
>>>> >
>>>> > To develop GLMM's, the dependend variable is binary (1-used locations;
>>>> > 0-available locations), and I have a initial set of 14 continuous
>>>> variables
>>>> > (8 land cover variables; 2 distance variables, to artificial areas and
>>>> > water sources; 4 topographic variables): a buffer was placed around
>>>> each
>>>> > location and the area of each land cover within that buffer was
>>>> accounted
>>>> > for; distances were measured from each point to the nearest feature,
>>>> and
>>>> > topographic variables were obtained using DEM rasters. I tested for
>>>> > correlation using Spearman's Rank, so not all 14 were used in the
>>>> GLMMs.
>>>> > All variables were transformed using z-score.
>>>> >
>>>> > As random effect, I used individual ID. I thought at the beggining to
>>>> use
>>>> > study area as a random effect but I only had 5 levels and there was
>>>> almost
>>>> > no variance when that random effect was used.
>>>> >
>>>> > I constructed a GLMM with 9 variables (not correlated) and a random
>>>> effect,
>>>> > then used "dredge()" function and "model.avg(dredge)" to sort models
>>>> by AIC
>>>> > values. This was the result (only models of AICc lower than 2
>>>> represented):
>>>> >
>>>> > [1]Call:
>>>> > model.avg(object = dredge.m1.1)
>>>> >
>>>> > Component model call:
>>>> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
>>>> > family =
>>>> >      binomial(link = "logit"))
>>>> >
>>>> > Component models:
>>>> >           df   logLik    AICc  delta weight
>>>> > 123578     8 -4309.94 8635.89   0.00   0.14
>>>> > 1235789    9 -4309.22 8636.44   0.55   0.10
>>>> > 123789     8 -4310.52 8637.04   1.14   0.08
>>>> > 1235678    9 -4309.75 8637.50   1.61   0.06
>>>> > 12378      7 -4311.78 8637.57   1.67   0.06
>>>> > 1234578    9 -4309.79 8637.58   1.69   0.06
>>>> >
>>>> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
>>>> cover
>>>> > variables, and 9 is a topographic variable. Weights seem to be very
>>>> low,
>>>> > even if I average all those models as it seems to be common when delta
>>>> > values are low.
>>>>
>>>> Well as far as we can tell from this, variables 4-9 aren't doing much
>>>> (on the other hand, variables 1-3 seem to be in all of the top models
>>>> you've shown us -- although presumably there are a bunch more models
>>>> that are almost like these, and similar in weight, with other
>>>> permutations of [123] + [some combination of 456789] ...)
>>>>
>>>>
>>>> Even with this weights, I constructed GLMMs for each of the
>>>> > combinations, and the results were simmilar for all 6 combinations.
>>>> Here
>>>> > are the results for the first one (GLMM + overdispersion + r-squared):
>>>> >
>>>> > Random effects:
>>>> >  Groups    Name        Variance Std.Dev.
>>>> >  ID.CODE_1 (Intercept) 13.02    3.608
>>>> > Number of obs: 32670, groups:  ID.CODE_1, 55
>>>> >
>>>> > Fixed effects:
>>>> >             Estimate Std. Error z value Pr(>|z|),
>>>> > (Intercept) -0.54891    0.51174  -1.073 0.283433
>>>> > 3       -0.22232    0.04059  -5.478 4.31e-08 ***
>>>> > 5       -0.05433    0.02837  -1.915 0.055460 .
>>>> > 7       -0.13108    0.02825  -4.640 3.49e-06 ***
>>>> > 8       -0.15864    0.08670  -1.830 0.067287 .
>>>> > 1         0.28438    0.02853   9.968  < 2e-16 ***
>>>> > 2         0.11531    0.03021   3.817 0.000135 ***
>>>> > Residual deviance: 0.256
>>>> > r.squaredGLMM():
>>>> >        R2m        R2c
>>>> > 0.01063077 0.80039950
>>>> > This is what I get from this analysis:
>>>> >
>>>> > 1) Variance and SD of the random effect seems fine (definitely better
>>>> than
>>>> > the "0" I got when using Study Areas as random effect);
>>>>
>>>>   yes -- SD of the random effects is much larger than any of the fixed
>>>> effects, which means that the differences among individuals are large
>>>> (presumably that means you have very different numbers of presences for
>>>> different number of individuals [all individuals sharing a common pool
>>>> of pseudo-absences ???)
>>>> >
>>>> > 2) Estimate values make sense from what I know of the species and the
>>>> > knowledge I have of the study areas;
>>>>
>>>>   Good!
>>>> >
>>>> > 3) Overdispersion values seem good, and R-squared values don't seem
>>>> very
>>>> > good (at least when considering only fixed effects) but, as I read in
>>>> > several places, AIC and r-squared are not always in agreement.
>>>>
>>>>   Overdispersion is meaningless for binary data.
>>>> >
>>>> > 4) Weight values seem very low. Does it mean the models are not good?
>>>>
>>>>   It means there are many approximately equivalent models.  Nothing in
>>>> this output tells you very much about absolute goodness of fit (which is
>>>> tricky for binary data).
>>>> >
>>>> > Then what I did was construct a GLM ("glm()"), so no random effect was
>>>> > used. I used the same set of variables used in [1], and here are the
>>>> > results (only models of AICc lower than 2 represented):
>>>> >
>>>> > [2] Call:
>>>> > model.avg(object = dredge.glm_m1.1)
>>>> >
>>>> > Component model call:
>>>> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link =
>>>> "logit"),
>>>> > data =
>>>> >      All_SA_Used_RP_Area_z)
>>>> >
>>>> > Component models:
>>>> >           df   logLik     AICc   delta weight
>>>> > 12345678   9 -9251.85 18521.70    0.00   0.52
>>>> > 123456789 10 -9251.77 18523.54    1.84   0.21
>>>> > 1345678    8 -9253.84 18523.69    1.99   0.19
>>>> >
>>>> > In this case, weight values are higher.
>>>> >
>>>> > Does this mean that it is better not to use a random effect? (I am
>>>> not sure
>>>> > I can compare GLMM with GLM results, correct me if I am doing wrong
>>>> > assumptions)
>>>>
>>>>   No.  You could do a likelihood ratio test with anova(), but note that
>>>> the AICc values for the glm() fits are 10,000 (!!) units higher than the
>>>> glmer fits.
>>>>
>>>>   While it will potentially greatly complicate your life, I think you
>>>> should at least *consider* interactions between your environment
>>>> variables and ID, i.e. allow for the possibility that different
>>>> individuals respond differently to habitat variation.
>>>>
>>>>   Ben Bolker
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>
>>>
>>> --
>>> Craig DeMars, Ph.D.
>>> Postdoctoral Fellow
>>> Department of Biological Sciences
>>> University of Alberta
>>> Phone: 780-221-3971
>>>
>>>
>>
>
>
> --
> Craig DeMars, Ph.D.
> Postdoctoral Fellow
> Department of Biological Sciences
> University of Alberta
> Phone: 780-221-3971
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list