[R-sig-ME] Fwd: lme4, lme4a, and overdispersed distributions (again)
David Atkins
datkins at u.washington.edu
Mon Jun 28 23:04:09 CEST 2010
Jeffrey Evans wrote:
> Hi Dave, I see that Ben Bolker et al's wiki page also references it.
>
> I'm not able to get glmer to do this. I do something similar with my dataset
> (dat$over = 1:nrow(dat)) and I get an error, not a warning.
>
>> Xfit$over = 1:nrow(Xfit)
>> m800 = glmer(cbind(gmdat$SdlFinal, gmdat$SdlMax-gmdat$SdlFinal)
> ~soilpc3+(1|gmdat$ID)+(1|over),data=Xfit, family="binomial")
>
> Error in function (fr, FL, glmFit, start, nAGQ, verbose) :
> Number of levels of a grouping factor for the random effects
> must be less than the number of observations
>
> What version are you using? I'm running version 33.
Sorry, I should have included sessionInfo() in my earlier email:
Looks like I'm running v 34 with Matrix v 41 (on R 2.11.1)
[And I did just re-confirm that it works with a different dataset...]
cheers, dave
> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] lme4_0.999375-34 Matrix_0.999375-41 lattice_0.19-7
[4] car_2.0-0 survival_2.35-9 leaps_2.9
[7] nnet_7.3-1 MASS_7.3-6
loaded via a namespace (and not attached):
[1] grid_2.11.1 MCMCglmm_2.05 memisc_0.95-30 nlme_3.1-96
[5] stats4_2.11.1
>
> Cheers,
> Jeff
>
>
>
> John et al.--
>
> Actually, looks like the current version of glmer *does* allow
> observation level random-effects, though it throws you a little warning
> (which seems entirely appropriate). (Thank you Doug!)
>
> Using the data that we were recently discussing (and attached to one of
> my previous posts "Data Redux"):
>
> > drink.df$over <- 1:nrow(drink.df)
> > drk.glmer <- glmer(drinks ~ weekday*gender + (1 | id) + (1 | over),
> + data = drink.df, family = poisson,
> + verbose = TRUE)
> Number of levels of a grouping factor for the random effects
> is *equal* to n, the number of observations
> 0: 96443.694: 1.63299 0.215642 -0.912787 -0.0135537 0.0244628
> -0.0137854 0.504067 1.13637 1.06483 0.418800 0.366592 0.288067
> 0.354827 0.443268 0.392234 0.287877
>
> [snip]
>
> 76: 73629.541: 4.35359 0.187209 -5.44079 -0.101647 -0.0249078
> -0.0721097 0.513271 1.65032 1.51045 0.132999 0.435110 0.356431
> 0.382994 0.600458 1.26700 0.797529
>
> > summary(drk.glmer)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: drinks ~ weekday * gender + (1 | id) + (1 | over)
> Data: drink.df
> AIC BIC logLik deviance
> 73662 73805 -36815 73630
> Random effects:
> Groups Name Variance Std.Dev.
> over (Intercept) 18.953746 4.35359
> id (Intercept) 0.035047 0.18721
> Number of obs: 56199, groups: over, 56199; id, 980
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -5.44079 0.15315 -35.53 < 2e-16 ***
> weekdayMonday -0.10165 0.22164 -0.46 0.64651
> weekdayTuesday -0.02491 0.21837 -0.11 0.90919
> weekdayWednesday -0.07211 0.22089 -0.33 0.74408
> weekdayThursday 0.51327 0.19953 2.57 0.01010 *
> weekdayFriday 1.65032 0.17918 9.21 < 2e-16 ***
> weekdaySaturday 1.51045 0.18023 8.38 < 2e-16 ***
> genderM 0.13300 0.22493 0.59 0.55432
> weekdayMonday:genderM 0.43511 0.31295 1.39 0.16442
> weekdayTuesday:genderM 0.35643 0.31078 1.15 0.25142
> weekdayWednesday:genderM 0.38299 0.31327 1.22 0.22150
> weekdayThursday:genderM 0.60046 0.28439 2.11 0.03474 *
> weekdayFriday:genderM 1.26700 0.25845 4.90 9.48e-07 ***
> weekdaySaturday:genderM 0.79712 0.26107 3.05 0.00226 **
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> As an aside, it is interesting to see for this particular data how the
> variance swings pretty wildly between the model without over-dispersion
> to the current one:
>
> > summary(drk.glmer)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: drinks ~ weekday * gender + (1 | id)
> Data: drink.df
> AIC BIC logLik deviance
> 146572 146706 -73271 146542
> Random effects:
> Groups Name Variance Std.Dev.
> id (Intercept) 0.92314 0.9608
> Number of obs: 56199, groups: id, 980
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.23639 0.04863 -25.42 < 2e-16 ***
> weekdayMonday -0.01035 0.03331 -0.31 0.756
> weekdayTuesday 0.02583 0.03304 0.78 0.434
> weekdayWednesday -0.01161 0.03341 -0.35 0.728
> weekdayThursday 0.50049 0.02976 16.82 < 2e-16 ***
> weekdayFriday 1.12677 0.02691 41.87 < 2e-16 ***
> weekdaySaturday 1.05954 0.02709 39.11 < 2e-16 ***
> genderM 0.30064 0.07099 4.24 2.28e-05 ***
> weekdayMonday:genderM 0.36970 0.04359 8.48 < 2e-16 ***
> weekdayTuesday:genderM 0.29552 0.04358 6.78 1.19e-11 ***
> weekdayWednesday:genderM 0.36295 0.04378 8.29 < 2e-16 ***
> weekdayThursday:genderM 0.45315 0.03913 11.58 < 2e-16 ***
> weekdayFriday:genderM 0.40290 0.03587 11.23 < 2e-16 ***
> weekdaySaturday:genderM 0.29764 0.03624 8.21 < 2e-16 ***
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> These data in all likelihood are probably best modeled by a two-part
> model, with zero vs. non-zero and count model for non-zeroes. The
> current version of MCMCglmm allows for zero-inflated models (where
> zeroes are a mixture of a point mass and count distribution), and the
> development version has hurdle formulations (with zero vs. non-zero, and
> then truncated count distribution for non-zeroes).
>
> cheers, Dave
>
> John Maindonald wrote:
>
> I think it more accurate to say that, in general, there may be
> a class of distributions, and therefore a possible multiplicity
> of likelihoods, not necessarily for distributions of exponential
> form. This is a PhD thesis asking to be done, or maybe
> someone has already done it.
>
> Over-dispersed distributions, where it is entirely clear what the
> distribution is, can be generated as GLM model + one random
> effect per observation. We have discussed this before. This
> seems to me the preferred way to go, if such a model seems to
> fit the data. I've not checked the current state of play re fitting
> such models in lme4 of lme4a; in the past some versions have
> allowed such a model.
>
> I like the simplicity of the one random effect per observation
> approach, as against what can seem the convoluted theoretical
> framework in which beta binomials live.
>
> John Maindonald email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
> >
> > On 25/06/2010, at 3:59 AM, Jeffrey Evans wrote:
> >
> >> Since I am definitely *not* a mathematician, I am straying in over
> my head
> >> here.
> >>
> >> I understand what you are saying - that there isn't a likelihood
> function
> >> for the quasi-binomial "distribution". And therefore, there is no-such
> >> distribution.
> >>
> >> What do you think of the suggestion that a beta-binomial mixture
> >> distribution could be used to model overdispersed binomial data?
> >>
> >> Would this be a techinically correct and logistically feasibile
> solution?
> >>
> >> -jeff
> >>
> >> -----Original Message-----
> >> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf
> Of Douglas
> >> Bates
> >> Sent: Thursday, June 24, 2010 1:25 PM
> >> To: Jeffrey Evans
> >> Cc: r-sig-mixed-models at r-project.org
> >> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions
> (again)
> >>
> >> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
> >> <Jeffrey.Evans at dartmouth.edu> wrote:
> >>> Like others, I have experienced trouble with estimation of the scale
> >>> parameter using the quasi-distributions in lme4, which is necessary to
> >>> calculate QAICc and rank overdispersed generalized linear mixed models.
> >>
> >>> I had several exchanges with Ben Bolker about this early last year
> >>> after his TREE paper came out
> >>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
> >>> %29000 19-6), and I know it's been discussed on on this list. Has
> >>> there been or is there any potential resolution to this forthcoming in
> >>> future releases of
> >>> lme4 or lme4a? I run into overdispersed binomial distributions
> >>> frequently and have had to use SAS to deal with them. SAS appears to
> >>> work, but it won't estimate the overdispersion parameter using laplace
> >>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods
> >>> can't be used for model ranking. I don't know why SAS can't/won't, but
> >>> lme4 will run these quasi-binomial and quasi-poisson distributions with
> >> Laplace estimation.
> >>
> >>> Is there a workable way to use lme4 for modeling overdispersed
> >>> binomial data?
> >>
> >> I have trouble discussing this because I come from a background as a
> >> mathematician and am used to tracing derivations back to the original
> >> definitions. So when I think of a likelihood (or, equivalently, a
> >> deviance) to be optimized it only makes sense to me if there is a
> >> probability distribution associated with the model. And for the
> >> quasi-binomial and quasi-Poisson families, there isn't a probability
> >> distribution. To me that means that discussing maximum likelihood
> >> estimators for such models is nonsense. The models simply do not exist.
> >> One can play tricks in the case of a generalized linear model to
> estimate a
> >> "quasi-parameter" that isn't part of the probability distribution
> but it is
> >> foolhardy to expect that the tricks will automatically carry over to a
> >> generalized linear mixed model.
> >>
> >> I am not denying that data that are over-dispersed with respect to the
> >> binomial or Poisson distributions can and do occur. But having data
> like
> >> this and a desire to model it doesn't make the quasi families real.
> In his
> >> signature Thierry Onkelinx quotes
> >>
> >> The combination of some data and an aching desire for an answer does not
> >> ensure that a reasonable answer can be extracted from a given body
> of data.
> >> ~ John Tukey
> >>
> >> I could and do plan to incorporate the negative binomial family but,
> without
> >> a definition that I can understand of a quasi-binomial or quasi-Poisson
> >> distribution and its associated probability function, I'm stuck. To
> me it's
> >> a "build bricks without straw" situation - you can't find maximum
> likelihood
> >> estimates for parameters that aren't part of the likelihood.
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> John Maindonald email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
More information about the R-sig-mixed-models
mailing list