[R-sig-ME] Fwd: lme4, lme4a, and overdispersed distributions (again)
David Atkins
datkins at u.washington.edu
Fri Jun 25 19:15:40 CEST 2010
John et al.--
Actually, looks like the current version of glmer *does* allow
observation level random-effects, though it throws you a little warning
(which seems entirely appropriate). (Thank you Doug!)
Using the data that we were recently discussing (and attached to one of
my previous posts "Data Redux"):
> drink.df$over <- 1:nrow(drink.df)
> drk.glmer <- glmer(drinks ~ weekday*gender + (1 | id) + (1 | over),
+ data = drink.df, family = poisson,
+ verbose = TRUE)
Number of levels of a grouping factor for the random effects
is *equal* to n, the number of observations
0: 96443.694: 1.63299 0.215642 -0.912787 -0.0135537 0.0244628
-0.0137854 0.504067 1.13637 1.06483 0.418800 0.366592 0.288067
0.354827 0.443268 0.392234 0.287877
[snip]
76: 73629.541: 4.35359 0.187209 -5.44079 -0.101647 -0.0249078
-0.0721097 0.513271 1.65032 1.51045 0.132999 0.435110 0.356431
0.382994 0.600458 1.26700 0.797529
> summary(drk.glmer)
Generalized linear mixed model fit by the Laplace approximation
Formula: drinks ~ weekday * gender + (1 | id) + (1 | over)
Data: drink.df
AIC BIC logLik deviance
73662 73805 -36815 73630
Random effects:
Groups Name Variance Std.Dev.
over (Intercept) 18.953746 4.35359
id (Intercept) 0.035047 0.18721
Number of obs: 56199, groups: over, 56199; id, 980
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.44079 0.15315 -35.53 < 2e-16 ***
weekdayMonday -0.10165 0.22164 -0.46 0.64651
weekdayTuesday -0.02491 0.21837 -0.11 0.90919
weekdayWednesday -0.07211 0.22089 -0.33 0.74408
weekdayThursday 0.51327 0.19953 2.57 0.01010 *
weekdayFriday 1.65032 0.17918 9.21 < 2e-16 ***
weekdaySaturday 1.51045 0.18023 8.38 < 2e-16 ***
genderM 0.13300 0.22493 0.59 0.55432
weekdayMonday:genderM 0.43511 0.31295 1.39 0.16442
weekdayTuesday:genderM 0.35643 0.31078 1.15 0.25142
weekdayWednesday:genderM 0.38299 0.31327 1.22 0.22150
weekdayThursday:genderM 0.60046 0.28439 2.11 0.03474 *
weekdayFriday:genderM 1.26700 0.25845 4.90 9.48e-07 ***
weekdaySaturday:genderM 0.79712 0.26107 3.05 0.00226 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As an aside, it is interesting to see for this particular data how the
variance swings pretty wildly between the model without over-dispersion
to the current one:
> summary(drk.glmer)
Generalized linear mixed model fit by the Laplace approximation
Formula: drinks ~ weekday * gender + (1 | id)
Data: drink.df
AIC BIC logLik deviance
146572 146706 -73271 146542
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.92314 0.9608
Number of obs: 56199, groups: id, 980
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.23639 0.04863 -25.42 < 2e-16 ***
weekdayMonday -0.01035 0.03331 -0.31 0.756
weekdayTuesday 0.02583 0.03304 0.78 0.434
weekdayWednesday -0.01161 0.03341 -0.35 0.728
weekdayThursday 0.50049 0.02976 16.82 < 2e-16 ***
weekdayFriday 1.12677 0.02691 41.87 < 2e-16 ***
weekdaySaturday 1.05954 0.02709 39.11 < 2e-16 ***
genderM 0.30064 0.07099 4.24 2.28e-05 ***
weekdayMonday:genderM 0.36970 0.04359 8.48 < 2e-16 ***
weekdayTuesday:genderM 0.29552 0.04358 6.78 1.19e-11 ***
weekdayWednesday:genderM 0.36295 0.04378 8.29 < 2e-16 ***
weekdayThursday:genderM 0.45315 0.03913 11.58 < 2e-16 ***
weekdayFriday:genderM 0.40290 0.03587 11.23 < 2e-16 ***
weekdaySaturday:genderM 0.29764 0.03624 8.21 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
These data in all likelihood are probably best modeled by a two-part
model, with zero vs. non-zero and count model for non-zeroes. The
current version of MCMCglmm allows for zero-inflated models (where
zeroes are a mixture of a point mass and count distribution), and the
development version has hurdle formulations (with zero vs. non-zero, and
then truncated count distribution for non-zeroes).
cheers, Dave
John Maindonald wrote:
I think it more accurate to say that, in general, there may be
a class of distributions, and therefore a possible multiplicity
of likelihoods, not necessarily for distributions of exponential
form. This is a PhD thesis asking to be done, or maybe
someone has already done it.
Over-dispersed distributions, where it is entirely clear what the
distribution is, can be generated as GLM model + one random
effect per observation. We have discussed this before. This
seems to me the preferred way to go, if such a model seems to
fit the data. I've not checked the current state of play re fitting
such models in lme4 of lme4a; in the past some versions have
allowed such a model.
I like the simplicity of the one random effect per observation
approach, as against what can seem the convoluted theoretical
framework in which beta binomials live.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
>
> On 25/06/2010, at 3:59 AM, Jeffrey Evans wrote:
>
>> Since I am definitely *not* a mathematician, I am straying in over
my head
>> here.
>>
>> I understand what you are saying - that there isn't a likelihood
function
>> for the quasi-binomial "distribution". And therefore, there is no-such
>> distribution.
>>
>> What do you think of the suggestion that a beta-binomial mixture
>> distribution could be used to model overdispersed binomial data?
>>
>> Would this be a techinically correct and logistically feasibile
solution?
>>
>> -jeff
>>
>> -----Original Message-----
>> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf
Of Douglas
>> Bates
>> Sent: Thursday, June 24, 2010 1:25 PM
>> To: Jeffrey Evans
>> Cc: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions
(again)
>>
>> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
>> <Jeffrey.Evans at dartmouth.edu> wrote:
>>> Like others, I have experienced trouble with estimation of the scale
>>> parameter using the quasi-distributions in lme4, which is necessary to
>>> calculate QAICc and rank overdispersed generalized linear mixed models.
>>
>>> I had several exchanges with Ben Bolker about this early last year
>>> after his TREE paper came out
>>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
>>> %29000 19-6), and I know it's been discussed on on this list. Has
>>> there been or is there any potential resolution to this forthcoming in
>>> future releases of
>>> lme4 or lme4a? I run into overdispersed binomial distributions
>>> frequently and have had to use SAS to deal with them. SAS appears to
>>> work, but it won't estimate the overdispersion parameter using laplace
>>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods
>>> can't be used for model ranking. I don't know why SAS can't/won't, but
>>> lme4 will run these quasi-binomial and quasi-poisson distributions with
>> Laplace estimation.
>>
>>> Is there a workable way to use lme4 for modeling overdispersed
>>> binomial data?
>>
>> I have trouble discussing this because I come from a background as a
>> mathematician and am used to tracing derivations back to the original
>> definitions. So when I think of a likelihood (or, equivalently, a
>> deviance) to be optimized it only makes sense to me if there is a
>> probability distribution associated with the model. And for the
>> quasi-binomial and quasi-Poisson families, there isn't a probability
>> distribution. To me that means that discussing maximum likelihood
>> estimators for such models is nonsense. The models simply do not exist.
>> One can play tricks in the case of a generalized linear model to
estimate a
>> "quasi-parameter" that isn't part of the probability distribution
but it is
>> foolhardy to expect that the tricks will automatically carry over to a
>> generalized linear mixed model.
>>
>> I am not denying that data that are over-dispersed with respect to the
>> binomial or Poisson distributions can and do occur. But having data
like
>> this and a desire to model it doesn't make the quasi families real.
In his
>> signature Thierry Onkelinx quotes
>>
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body
of data.
>> ~ John Tukey
>>
>> I could and do plan to incorporate the negative binomial family but,
without
>> a definition that I can understand of a quasi-binomial or quasi-Poisson
>> distribution and its associated probability function, I'm stuck. To
me it's
>> a "build bricks without straw" situation - you can't find maximum
likelihood
>> estimates for parameters that aren't part of the likelihood.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
--
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu
Center for the Study of Health and Risk Behaviors (CSHRB)
1100 NE 45th Street, Suite 300
Seattle, WA 98105
206-616-3879
http://depts.washington.edu/cshrb/
(Mon-Wed)
Center for Healthcare Improvement, for Addictions, Mental Illness,
Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104
http://www.chammp.org
(Thurs)
More information about the R-sig-mixed-models
mailing list