[R-sig-ME] lme4 and GLMMs: distribution choice (Gamma) and random effects syntax problems

Ben Bolker bbolker at gmail.com
Fri Jan 28 15:36:38 CET 2011


On 01/28/2011 07:02 AM, Roslyn Anderson wrote:
> Dear List Members,
> 


  There's a lot here, so I'm going to answer sparingly.

> I am working with generalised linear mixed models (GLMMs), mainly using the
> lme4 package in R (2.9.2 and 2.12.1).
> 
> The study involves looking at the effects of varying levels of grazing
> intensity (grazing states) on the biodiversity (birds, beetles and plants)
> of hill sheep farms. Here I will concentrate on the birds. The data is of a
> nested nature so I chose to use linear mixed effects models. None of the
> response variables are normally distributed so I could:
> 
> 
> 
> i)                    transform the response variables and run a linear
> mixed effects model or
> 
> ii)                   use a generalised linear mixed model
> 

  If the data are transformable in some *reasonable* way then
transformation + LMM will be somewhat easier and provide a greater range
of options. On the other hand, transformations may not be feasible or
may make interpretation somewhat less transparent (see comments below).

> Random effects are: *transects*, which are nested within *farms*, which are
> nested within *sites*
> 
> Fixed effects are: *grazing state* and *altitude*
>
> Response variable 1 is *bird density* (continuous, range: 1.61-36.05)
> 
> Error structure: GAMMA DISTRIBUTION

  In general for these 'gamma distributed' responses I would suggest
switching to log-normal responses instead (i.e., log-transform your data
and then use a LMM).  Gamma GLMMs are harder to fit, and are not working
yet in lme4 (as you found out).

 [snip]

> 
> Response variable 3 is *bird species richness* (counts, range: 1-19)
> 
> Error structure: POISSON OR NEGATIVE BINOMIAL DISTRIBUTION

  Count data are likely to be harder to transform, so here you probably
do want to use a GLMM.

> Response variable 4 is *bird evenness* (proportions, range: 0.52-1)
> 
> Error structure: BINOMIAL DISTRIBUTION

  You can't use a binomial GLMM **unless you know the denominator**.  If
you only have access to the proportions, then the parametric
distribution you're after would be the Beta distribution (although it
can't handle values of exactly 0 or 1 without some tweaking), and Beta
distributions are (a) not formally within the class of GLMMs and (b)
haven't been widely implemented as mixed models.
   If you have denominators, then go for Binomial.  Otherwise perhaps
see if you can logit-transform and work on that scale (although the 1.0
values will again give you trouble)?

> The birds were surveyed over two years (and three farms from 2007 were
> repeated in 2008):
> 
> 
> 
> *2007*
> 
> grazing.state = (1,2,5)
> 
> alt = altitude (continuous in metres)
> 
> site = area containing 3 farms (1,2,3,4)
> 
> farm = farm number (1,2,3,4,5,6,7,8,9,10,11,12)
> 
> t.id = transect identification number (1 – 10)
> 
> 
> 
> *2008*
> 
> grazing.state = (1,2,3,5)
> 
> alt = altitude (continuous in metres)
> 
> site = area containing 3 farms (1,2,3 (repeated from 2007), 5,6,7)
> 
> farm = farm number (3, 4, 8 (repeated from 2007), 13, 14, 15, 16, 17, 18,
> 19, 20, 21)
> 
> t.id = transect identification number (1 – 10)

   This is a fairly nasty level of unbalance. In principle there may be
differences between years, but you will have difficulty identifying them
because you have little overlap between years ... is grazing state
something you're willing to treat as a continuous variable?

  Do you have multiple observations per site/farm/transect combination,
or is that the lowest level of replication?  If the latter, then
"site/farm/transect" is confounded with the residual error; lmer, and
older versions of glmer, will complain.  (Recent versions of glmer allow
there to be *equal* numbers of observations and random effect levels, as
a way of incorporating overdispersion.)
> 
> Examples of model syntax:
>
> *library(nlme)*
> 
> a) *linear regression model*
> 
>> with(avian,{model1 <- lm(*bird.density* ~ grazing.state*alt)})
> 
> b) *generalised linear model*
> 
>> with(avian,{model2<-glm(*bird.density* ~ grazing.state * alt, family =
> Gamma, data = avian)})
> 
> c) *generalised linear model (poisson distribution, as bird species richness
> is count data)*
> 
>> with(avian,{model3 <- glm(*bird.sp.rich* ~ grazing.state*alt, family =
> poisson, data = avian)})
>
> *library(MASS)*
> 
> d) *generalised linear model (negative binomial distribution)*
> 
>> with(avian,{model4 <- glm.nb(*bird.sp.rich* ~ grazing.state * alt ,
> link="log",data= avian)})
> 
> e) *linear mixed effects model*
> 
>> with(avian,{model5<-lme(*bird.density* ~ grazing.state * alt, random = ~
> 1|site/farm/t.id, data = avian)})
> 
> *library(lme4)*
> 
> As density is continuous, I am using a Gamma distribution. However I can’t
> seem to get the GLMM to work in lme4:

  As mentioned above, this doesn't currently work in lme4 (as you found out)

> f) *generalised linear mixed model*
> 
>> with(avian,{model6a<-glmer(*bird.density* ~ grazing.state * alt +
> (1|site/farm/t.id), family = Gamma, link = log, data = avian)})
> 
> Error in function (fr, FL, glmFit, start, nAGQ, verbose)  :
> 
> Number of levels of a grouping factor for the random effects
> 
> must be less than n, the number of observations
> 
> Removing transect id gets rid of that error message, although I’m not too
> confident about the random error structure now. I then get a new error
> message:

  This surprises me a little bit.  What version of lme4 are you using?
> 
> 
> 
>> with(avian,{model6b<-glmer(*bird.density* ~ grazing.state * alt +
> (1|site/farm), family = Gamma, link = log, data = avian)})
> 
> Error in mer_finalize(ans) :
> 
>   mu[i] must be positive: mu = -0.0211949, i = 100
> 

  This is the problem I mentioned previously.  Stay tuned ...
> 

> There are no negative values here?
> 

  The negative value is generated internally, as a predicted value.  Not
your fault.
> Bird species richness (count data) does work:
> 
> 
> 
>> with(avian,{model6c<-glmer(*bird.sp.rich* ~ grazing.state * alt +
> (1|site/farm), family = poisson, link = log, data = avian)})

> I understand how to use the quasipoisson command to look at the dispersion
> parameter in a glm but I wasn’t sure how to do this in a glmm. I found this
> syntax on an earlier thread from the r-sig-mixed-models list:
>
>> (phi<-lme4:::sigma(model))
> 
> This worked but it gave me a figure of 1, which I was a little suspicious
> of, however, plotting the residuals shows no patterns so perhaps this is
> okay?

  Gives 1.0 because a poisson rather than quasipoisson was fitted.  See
on-line supplements to Bolker et al 2009 -- but NOTE that I now
recommend fitting an observation-level random effect instead of using
quasi- (which recent versions of lme4 don't allow anyway)
> 
> 
> 
> Bird evenness (proportions) doesn’t work either:
> 
> 
> 
>> with(avian,{model6d<-glmer(*bird.even* ~ grazing.state * alt +
> (1|site/farm), family = binomial, link = logit, data = avian)})
> 
> Error in glmer(bird.even ~ grazing.state * alt + (1 | site/farm), family =
> binomial,  :
> 
>   object 'logit' not found

  correct syntax would be family=binomial(link="logit") [or simply
family=binomial, since logit is the default link.  As noted above this
won't work properly since your evenness stats are proportions without
known denominators
> 
> 
> 
>> with(avian,{model6e<-glmer(*bird.even* ~ grazing.state * alt +
> (1|site/farm), family = binomial, data = avian)})
> 
> Warning messages:
> 
> 1: In eval(expr, envir, enclos) :
> 
>   non-integer #successes in a binomial glm!

  As noted above.

> 
> 2: In mer_finalize(ans) : singular convergence (7)
> 
> 
> 
> I also tried link = quasibinomial, which works in R 2.9.2 but not in R
> 2.12.1.
> 
> 
> 
> Going back to bird density. It seems to works in *MASS* but a message
> appears saying iteration 1? It still gives me a model summary though.
> 
> 
> 
>> with(avian,{model7<-glmmPQL(*bird.density* ~ grazing.state * alt, random =
> ~ 1|site/farm/t.id, family = Gamma, data = avian)})
> 
> iteration 1
> 
> 
> 
> Bird diversity, species richness and evenness give larger error messages.
> 
> 
> 
> I have also run the models using *glmmML* (although species richness is the
> only one which works as it only allows poisson and binomial distributions).
> 
> 
> 
>> with(avian,{model8<-glmmML(*bird.sp.rich* ~ grazing.state * alt, cluster =
> site, family = poisson, data = avian)})
> 
> 
> 
> I tried *glmmADMB* too (in R 2.9.2) but couldn’t get it to work at all.
> 

  Can you give more specifics on this in a separate e-mail?  I am trying
to improve glmmADMB. Please install the latest version of glmmADMB
(install.packages("glmmADMB",repos="http://r-forge.r-project.org")) in
2.12.1 and let me know how you do.

> 
> So, my questions are:
> 
> 
> 
> Q1) Do you think I am having so many problems fitting glmm models to my data
> because the response variables are not counts or proper proportions? I had
> originally thought that Gamma would be able to deal with continuous data. Do
> you think that given all my problems, it may be more sensible to transform
> the response variables and run linear mixed effects models instead?

   Yes for the skewed ones -- I would go with log transformation as long
as it seems reasonable

> 
> 
> 
> Q2) My random effects structure is nested (1|site/farm/t.id), as opposed to
> crossed (1|site) + (1|farm) + (1|t.id). However warning messages appear in
> most of the glmm unless I remove t.id, and in some cases farm too. I had
> read about adding a random effect with a distinct level for each of my 120
> observations, however this doesn’t seem to work either. Have I made a
> fundamental error in my random effects structure?

   See comments above.
> 
> 
> 
> Q3) I also have a fifth response variable which I would like to look at. It
> is bird beta diversity (Sorensen’s Index). Data are in the form of
> proportions (range: 0.18-0.53). However, the data show a clear
> *bimodal*distribution. I understand that transformations are not
> really an option but
> I have read that *finite mixture models* may be a good alternative and was
> just wondering if anyone may be able to suggest some R syntax I might use to
> tackle this?
> 

   Remember that it is not the *response variable* but the
*residuals*/*error distribution* that must be follow a particular
distribution.  Is bird beta diversity really bimodally distributed
*within* a given (small) range of predictors?

  good luck,
    Ben Bolker




More information about the R-sig-mixed-models mailing list