[R-sig-ME] Fwd: lme4, lme4a, and overdispersed distributions (again)

David Atkins datkins at u.washington.edu
Mon Jun 28 23:04:09 CEST 2010



Jeffrey Evans wrote:
> Hi Dave, I see that Ben Bolker et al's wiki page also references it.
> 
> I'm not able to get glmer to do this. I do something similar with my dataset
> (dat$over = 1:nrow(dat)) and I get an error, not a warning.
> 
>> Xfit$over = 1:nrow(Xfit)
>> m800 = glmer(cbind(gmdat$SdlFinal, gmdat$SdlMax-gmdat$SdlFinal)
> ~soilpc3+(1|gmdat$ID)+(1|over),data=Xfit, family="binomial")
> 
> Error in function (fr, FL, glmFit, start, nAGQ, verbose)  : 
>   Number of levels of a grouping factor for the random effects
> must be less than the number of observations
> 
> What version are you using? I'm running version 33.

Sorry, I should have included sessionInfo() in my earlier email:

Looks like I'm running v 34 with Matrix v 41 (on R 2.11.1)

[And I did just re-confirm that it works with a different dataset...]

cheers, dave

 > sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] lme4_0.999375-34   Matrix_0.999375-41 lattice_0.19-7
[4] car_2.0-0          survival_2.35-9    leaps_2.9
[7] nnet_7.3-1         MASS_7.3-6

loaded via a namespace (and not attached):
[1] grid_2.11.1    MCMCglmm_2.05  memisc_0.95-30 nlme_3.1-96
[5] stats4_2.11.1

> 
> Cheers,
> Jeff
> 
> 
> 
> John et al.--
> 
> Actually, looks like the current version of glmer *does* allow 
> observation level random-effects, though it throws you a little warning 
> (which seems entirely appropriate). (Thank you Doug!)
> 
> Using the data that we were recently discussing (and attached to one of 
> my previous posts "Data Redux"):
> 
>  > drink.df$over <- 1:nrow(drink.df)
>  > drk.glmer <- glmer(drinks ~ weekday*gender + (1 | id) + (1 | over),
> + 					data = drink.df, family = poisson,
> + 					verbose = TRUE)
> Number of levels of a grouping factor for the random effects
> is *equal* to n, the number of observations
>    0:     96443.694:  1.63299 0.215642 -0.912787 -0.0135537 0.0244628 
> -0.0137854 0.504067  1.13637  1.06483 0.418800 0.366592 0.288067 
> 0.354827 0.443268 0.392234 0.287877
> 
> [snip]
> 
>   76:     73629.541:  4.35359 0.187209 -5.44079 -0.101647 -0.0249078 
> -0.0721097 0.513271  1.65032  1.51045 0.132999 0.435110 0.356431 
> 0.382994 0.600458  1.26700 0.797529
> 
>  > summary(drk.glmer)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: drinks ~ weekday * gender + (1 | id) + (1 | over)
>     Data: drink.df
>     AIC   BIC logLik deviance
>   73662 73805 -36815    73630
> Random effects:
>   Groups Name        Variance  Std.Dev.
>   over   (Intercept) 18.953746 4.35359
>   id     (Intercept)  0.035047 0.18721
> Number of obs: 56199, groups: over, 56199; id, 980
> 
> Fixed effects:
>                           Estimate Std. Error z value Pr(>|z|)
> (Intercept)              -5.44079    0.15315  -35.53  < 2e-16 ***
> weekdayMonday            -0.10165    0.22164   -0.46  0.64651
> weekdayTuesday           -0.02491    0.21837   -0.11  0.90919
> weekdayWednesday         -0.07211    0.22089   -0.33  0.74408
> weekdayThursday           0.51327    0.19953    2.57  0.01010 *
> weekdayFriday             1.65032    0.17918    9.21  < 2e-16 ***
> weekdaySaturday           1.51045    0.18023    8.38  < 2e-16 ***
> genderM                   0.13300    0.22493    0.59  0.55432
> weekdayMonday:genderM     0.43511    0.31295    1.39  0.16442
> weekdayTuesday:genderM    0.35643    0.31078    1.15  0.25142
> weekdayWednesday:genderM  0.38299    0.31327    1.22  0.22150
> weekdayThursday:genderM   0.60046    0.28439    2.11  0.03474 *
> weekdayFriday:genderM     1.26700    0.25845    4.90 9.48e-07 ***
> weekdaySaturday:genderM   0.79712    0.26107    3.05  0.00226 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> As an aside, it is interesting to see for this particular data how the 
> variance swings pretty wildly between the model without over-dispersion 
> to the current one:
> 
>  > summary(drk.glmer)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: drinks ~ weekday * gender + (1 | id)
>     Data: drink.df
>      AIC    BIC logLik deviance
>   146572 146706 -73271   146542
> Random effects:
>   Groups Name        Variance Std.Dev.
>   id     (Intercept) 0.92314  0.9608
> Number of obs: 56199, groups: id, 980
> 
> Fixed effects:
>                           Estimate Std. Error z value Pr(>|z|)
> (Intercept)              -1.23639    0.04863  -25.42  < 2e-16 ***
> weekdayMonday            -0.01035    0.03331   -0.31    0.756
> weekdayTuesday            0.02583    0.03304    0.78    0.434
> weekdayWednesday         -0.01161    0.03341   -0.35    0.728
> weekdayThursday           0.50049    0.02976   16.82  < 2e-16 ***
> weekdayFriday             1.12677    0.02691   41.87  < 2e-16 ***
> weekdaySaturday           1.05954    0.02709   39.11  < 2e-16 ***
> genderM                   0.30064    0.07099    4.24 2.28e-05 ***
> weekdayMonday:genderM     0.36970    0.04359    8.48  < 2e-16 ***
> weekdayTuesday:genderM    0.29552    0.04358    6.78 1.19e-11 ***
> weekdayWednesday:genderM  0.36295    0.04378    8.29  < 2e-16 ***
> weekdayThursday:genderM   0.45315    0.03913   11.58  < 2e-16 ***
> weekdayFriday:genderM     0.40290    0.03587   11.23  < 2e-16 ***
> weekdaySaturday:genderM   0.29764    0.03624    8.21  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> These data in all likelihood are probably best modeled by a two-part 
> model, with zero vs. non-zero and count model for non-zeroes.  The 
> current version of MCMCglmm allows for zero-inflated models (where 
> zeroes are a mixture of a point mass and count distribution), and the 
> development version has hurdle formulations (with zero vs. non-zero, and 
> then truncated count distribution for non-zeroes).
> 
> cheers, Dave
> 
> John Maindonald wrote:
> 
> I think it more accurate to say that, in general, there may be
> a class of distributions, and therefore a possible multiplicity
> of likelihoods, not necessarily for distributions of exponential
> form.  This is a PhD thesis asking to be done, or maybe
> someone has already done it.
> 
> Over-dispersed distributions, where it is entirely clear what the
> distribution is, can be generated as GLM model +  one random
> effect per observation.  We have discussed this before.  This
> seems to me the preferred way to go, if such a model seems to
> fit the data.  I've not checked the current state of play re fitting
> such models in lme4 of lme4a; in the past some versions have
> allowed such a model.
> 
> I like the simplicity of the one random effect per observation
> approach, as against what can seem the convoluted theoretical
> framework in which beta binomials live.
> 
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
> 
>  >
>  > On 25/06/2010, at 3:59 AM, Jeffrey Evans wrote:
>  >
>  >> Since I am definitely *not* a mathematician, I am straying in over 
> my head
>  >> here.
>  >>
>  >> I understand what you are saying - that there isn't a likelihood 
> function
>  >> for the quasi-binomial "distribution". And therefore, there is no-such
>  >> distribution.
>  >>
>  >> What do you think of the suggestion that a beta-binomial mixture
>  >> distribution could be used to model overdispersed binomial data?
>  >>
>  >> Would this be a techinically correct and logistically feasibile 
> solution?
>  >>
>  >> -jeff
>  >>
>  >> -----Original Message-----
>  >> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf 
> Of Douglas
>  >> Bates
>  >> Sent: Thursday, June 24, 2010 1:25 PM
>  >> To: Jeffrey Evans
>  >> Cc: r-sig-mixed-models at r-project.org
>  >> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions 
> (again)
>  >>
>  >> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
>  >> <Jeffrey.Evans at dartmouth.edu> wrote:
>  >>> Like others, I have experienced trouble with estimation of the scale
>  >>> parameter using the quasi-distributions in lme4, which is necessary to
>  >>> calculate QAICc and rank overdispersed generalized linear mixed models.
>  >>
>  >>> I had several exchanges with Ben Bolker about this early last year
>  >>> after his TREE paper came out
>  >>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
>  >>> %29000 19-6), and I know it's been discussed on on this list. Has
>  >>> there been or is there any potential resolution to this forthcoming in
>  >>> future releases of
>  >>> lme4 or lme4a? I run into overdispersed binomial distributions
>  >>> frequently and have had to use SAS to deal with them. SAS appears to
>  >>> work, but it won't estimate the overdispersion parameter using laplace
>  >>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods
>  >>> can't be used for model ranking. I don't know why SAS can't/won't, but
>  >>> lme4 will run these quasi-binomial and quasi-poisson distributions with
>  >> Laplace estimation.
>  >>
>  >>> Is there a workable way to use lme4 for modeling overdispersed
>  >>> binomial data?
>  >>
>  >> I have trouble discussing this because I come from a background as a
>  >> mathematician and am used to tracing derivations back to the original
>  >> definitions.  So when I think of a likelihood (or, equivalently, a
>  >> deviance) to be optimized it only makes sense to me if there is a
>  >> probability distribution associated with the model.  And for the
>  >> quasi-binomial and quasi-Poisson families, there isn't a probability
>  >> distribution.  To me that means that discussing maximum likelihood
>  >> estimators for such models is nonsense.  The models simply do not exist.
>  >> One can play tricks in the case of a generalized linear model to 
> estimate a
>  >> "quasi-parameter" that isn't part of the probability distribution 
> but it is
>  >> foolhardy to expect that the tricks will automatically carry over to a
>  >> generalized linear mixed model.
>  >>
>  >> I am not denying that data that are over-dispersed with respect to the
>  >> binomial or Poisson distributions can and do occur.  But having data 
> like
>  >> this and a desire to model it doesn't make the quasi families real. 
>   In his
>  >> signature Thierry Onkelinx quotes
>  >>
>  >> The combination of some data and an aching desire for an answer does not
>  >> ensure that a reasonable answer can be extracted from a given body 
> of data.
>  >> ~ John Tukey
>  >>
>  >> I could and do plan to incorporate the negative binomial family but, 
> without
>  >> a definition that I can understand of a quasi-binomial or quasi-Poisson
>  >> distribution and its associated probability function, I'm stuck. To 
> me it's
>  >> a "build bricks without straw" situation - you can't find maximum 
> likelihood
>  >> estimates for parameters that aren't part of the likelihood.
>  >>
>  >> _______________________________________________
>  >> R-sig-mixed-models at r-project.org mailing list
>  >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>  >
> 
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>




More information about the R-sig-mixed-models mailing list