[R-sig-ME] Binomial model R^2 and ICC calculation with lme4

Tue Oct 28 19:33:54 CET 2014

Hi Maurits,

I think you have answered your own question here:

“I got the answer myself, it appears that the different outcomes are due to the inclusion of a dispersion parameter in my model, introduced as a random effect."
http://jonlefcheck.net/2013/03/13/r2-for-linear-mixed-effects-models/comment-page-1/#comment-2424

…but for the sake of posterity I’ll reply to the list.

As you say,  the problem is the inclusion of the observation-level random effect (OLRE) in the denominator of conditional R-squared (Rsq.c). By default r.squaredGLMM in MuMIn will include all the random effects when calculating the total random effects variance from a binomial GLMM fitted with glmer. The OLRE is a residual variance, so represents unexplained variation and shouldn’t be included in the denominator of Rsq.c. Kamil Bartoń (cc-ed), who wrote and maintains MuMIn solved this for glmer(…, family = poisson) but not for binomial models. Looking at getAnywhere(r.squaredGLMM.merMod), I think it would be straightforward to do this - what do you think, Kamil?

The OLRE variance should instead be included as Se, the additive dispersion variance. I’ve adapted your code to an example from ?glmer to illustrate this. You should be able to use these variance components to get the ICC you want following Table 1 of:
Nakagawa, S. & Schielzeth, H. (2010). Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews of the Cambridge Philosophical Society, 85, 935–56.

As an aside, slopes and intercepts are almost certain to be correlated (unless you centre the x very carefully) so it’s unusual to include random intercepts and slopes but not allow them to be correlated. 

Good luck,
Paul

# adapt an example from ?glmer to make a random intercepts-and-slopes
# model

  library(lme4)
  library(MuMIn)

  cbpp$obs <- factor(1:nrow(cbpp))
  cbpp$period <- as.numeric(cbpp$period)
  (gm3 <- glmer(cbind(incidence, size - incidence) ~ period +
      (period | herd) +  (1|obs),
                family = binomial, data = cbpp))

# fixed effects variance component

  X <- model.matrix(gm3)
  n <- nrow(X)
  Beta <- fixef(gm3)
  Sf <- var(X %*% Beta)

# random effects variance components
# not that I've excluded the "obs" variance component

  (Sigma.list <- VarCorr(gm3))

  Sl <- 
    sum(
      sapply(Sigma.list[names(Sigma.list)  != "obs"],
             function(Sigma)
             {
               Z <-X[,rownames(Sigma)]
               sum(diag(Z %*% Sigma %*% t(Z)))/n
             }))

# The additive dispersion shouldn't be fixed at one, but rather should be attributed to Se,
# the residual variance which here is the variance of the observtion-level random effect

  Se <- c(Sigma.list$obs)

# Finally, the distribution-specific variance, Sd, is pi^2/3 for binomial GLMMs:

  Sd <- pi^2/3

# Use eqns 29 and 30 from Nakagawa & Schielzeth to estimate marginal and
# conditional R-squared:

  total.var <- Sf + Sl + Se + Sd
  (Rsq.m <- Sf / total.var) 
  (Rsq.c <- (Sf + Sl) / total.var) 

# r.squaredGLMM gives the same Rsq.m but higher Rsq.c

  r.squaredGLMM(gm3)

# the Rsq.c from r.squaredGLMM has Se in the numerator:

  (Sf + Sl + Se) / total.var

On 23 Oct 2014, at 11:41, Maurits Van Zinnicq Bergmann <mauritsvzb at gmail.com> wrote:

> Dear list,
> 
> I�m trying to calculate the marginal and conditional R^2 for a binomial GLMM using the lme4 package. Some googling a while ago brought me to the site of Jonathan Lefcheck (http://jonlefcheck.net/2013/03/13/r2-for-linear-mixed-effects-models/) which offers exactly this. However, more recently I became aware of a recent paper by Dr. Paul Johnson (http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12225/abstract) that explains the calculation of R^2 for random slope models and points to the r.squaredGLMM function in the MuMIn package.
> 
> Unfortunately I experience some inconsistencies and problems when comparing R^2 values of three different GLMMs 1) random intercept mixed model, 2) mixed model with random covariance between intercept and slope, and 3) a mixed model without random covariance between intercept and slope, using the 2 methods:.
> 
> 1) a comparison of marginal and conditional R^2 estimates between Lefcheck�s method and the MuMIn package showed consistent marginal R^2 estimates for the first and the second GLMM, whereas the marginal R^2 values were lower using Lefcheck�s method. The third mixed model showed both different conditional and marginal R^2. Could anyone explain why I obtain different R^2 estimates when comparing the 2 methods,
> 
> 2) When I calculate R^2 values using the code from the supporting information with Dr. Johnsons paper (slightly modified to reflect a binomial GLMM instead of poison LMM) I obtain different values when compared to the values obtained from the MuMIn package. If I calculated the variance components correctly, they should return identical R^2 values. While I don�t have to bother with the mistake(s) I made in the code to obtain R^2 values (I can just use the package instead), I would like to know how to correctly calculate the variance components needed to calculate the ICC over the whole sample of all three models.
> 
> Thank you for any help.
> Maurits van Zinnicq
> 
> 
> 
> 
> 
> # random intercept model
> m1 <- glmer(cbind(df$Valid.detections, df$Missed.detections) ~ distance + 
>               Habitat + Replicate + transmitter.depth + receiver.depth + 
>               wind.speed + wtc + Transmitter + (1 | Unit) + 
>               (1 | SUR.ID) + distance:Transmitter + 
>               distance:Habitat + distance:transmitter.depth + distance:receiver.depth + 
>               distance:wind.speed, data = df, family = binomial(link=logit),control=glmerControl(calc.derivs=F))
> 
> # random slope model
> m2 <- glmer(cbind(df$Valid.detections, df$Missed.detections) ~ distance + 
>                  Habitat + Replicate + transmitter.depth + receiver.depth + 
>                  wind.speed + wtc + Transmitter + (1 | Unit) + 
>                  (distance | SUR.ID) + distance:Transmitter + 
>                  distance:Habitat + distance:transmitter.depth + distance:receiver.depth + 
>                  distance:wind.speed, data = df, family = binomial(link=logit),control=glmerControl(calc.derivs=F))
> 
> # random slope model without random covariance between slope/intercept
> m3 <- glmer(cbind(df$Valid.detections, df$Missed.detections) ~ distance + 
>               Habitat + Replicate + transmitter.depth + receiver.depth + 
>               wind.speed + wtc + Transmitter + (1 | Unit) + 
>               (1 | SUR.ID) + (0 + distance | SUR.ID) + distance:Transmitter + 
>               distance:Habitat + distance:transmitter.depth + distance:receiver.depth + 
>               distance:wind.speed, data = df, family = binomial(link=logit),control=glmerControl(calc.derivs=F))
> 
> 
> 
> 
> 
> # adjusted code from Paul Johnson to reflect binomial GLMM
> 
> # Extract the variance components require for the R-squared statistics,
> # starting with the better-fitting random slopes model.
> 
> # First we need the design matrix (X), the number of observations (n)
> # and the fixed effect estimates (Beta)
> 
> X <- model.matrix(m1)
> n <- nrow(X)
> Beta <- fixef(m1)
> 
> # First the fixed effects variance (eqn 27 of Nakagawa & Schielzeth): 
> 
> Sf <- var(X %*% Beta)
> 
> # Second, the list of covariance matrices of the random effects.
> # Here the list has only a single element because there is only
> # one level of random effects.
> 
> (Sigma.list <- VarCorr(m1))
> 
> # Use equation 11 in the paper to estimate the random effects variance 
> # component.
> # Using sapply ensures that the same code will work when there are multiple
> # random effects (i.e. where length(Sigma.list) > 1)
> 
> Sl <- 
>   sum(
>     sapply(Sigma.list,
>            function(Sigma)
>            {
>              Z <-X[,rownames(Sigma)]
>              sum(diag(Z %*% Sigma %*% t(Z)))/n
>            }))
> 
> 
> # As this model is an binomial GLMM, the additive dispersion variance, Se, is
> # usually fixed to 1 for computational reasons, as additive disperson is not identifiable (Nakagawa & Schielzeth 2013)
> 
> Se <- 1
> 
> # Finally, the distribution-specific variance, Sd, is pi^2/3 for binomial GLMMs:
> 
> Sd <- pi^2/3
> 
> # Use eqns 29 and 30 from Nakagawa & Schielzeth to estimate marginal and
> # conditional R-squared:
> 
> total.var <- Sf + Sl + Se + Sd
> (Rsq.m <- Sf / total.var) 
> (Rsq.c <- (Sf + Sl) / total.var) 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models