[R-sig-ME] Fwd: lme4, lme4a, and overdispersed distributions (again)

Fri Jun 25 19:15:40 CEST 2010

John et al.--

Actually, looks like the current version of glmer *does* allow 
observation level random-effects, though it throws you a little warning 
(which seems entirely appropriate). (Thank you Doug!)

Using the data that we were recently discussing (and attached to one of 
my previous posts "Data Redux"):

 > drink.df$over <- 1:nrow(drink.df)
 > drk.glmer <- glmer(drinks ~ weekday*gender + (1 | id) + (1 | over),
+ 					data = drink.df, family = poisson,
+ 					verbose = TRUE)
Number of levels of a grouping factor for the random effects
is *equal* to n, the number of observations
   0:     96443.694:  1.63299 0.215642 -0.912787 -0.0135537 0.0244628 
-0.0137854 0.504067  1.13637  1.06483 0.418800 0.366592 0.288067 
0.354827 0.443268 0.392234 0.287877

[snip]

  76:     73629.541:  4.35359 0.187209 -5.44079 -0.101647 -0.0249078 
-0.0721097 0.513271  1.65032  1.51045 0.132999 0.435110 0.356431 
0.382994 0.600458  1.26700 0.797529

 > summary(drk.glmer)
Generalized linear mixed model fit by the Laplace approximation
Formula: drinks ~ weekday * gender + (1 | id) + (1 | over)
    Data: drink.df
    AIC   BIC logLik deviance
  73662 73805 -36815    73630
Random effects:
  Groups Name        Variance  Std.Dev.
  over   (Intercept) 18.953746 4.35359
  id     (Intercept)  0.035047 0.18721
Number of obs: 56199, groups: over, 56199; id, 980

Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)
(Intercept)              -5.44079    0.15315  -35.53  < 2e-16 ***
weekdayMonday            -0.10165    0.22164   -0.46  0.64651
weekdayTuesday           -0.02491    0.21837   -0.11  0.90919
weekdayWednesday         -0.07211    0.22089   -0.33  0.74408
weekdayThursday           0.51327    0.19953    2.57  0.01010 *
weekdayFriday             1.65032    0.17918    9.21  < 2e-16 ***
weekdaySaturday           1.51045    0.18023    8.38  < 2e-16 ***
genderM                   0.13300    0.22493    0.59  0.55432
weekdayMonday:genderM     0.43511    0.31295    1.39  0.16442
weekdayTuesday:genderM    0.35643    0.31078    1.15  0.25142
weekdayWednesday:genderM  0.38299    0.31327    1.22  0.22150
weekdayThursday:genderM   0.60046    0.28439    2.11  0.03474 *
weekdayFriday:genderM     1.26700    0.25845    4.90 9.48e-07 ***
weekdaySaturday:genderM   0.79712    0.26107    3.05  0.00226 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As an aside, it is interesting to see for this particular data how the 
variance swings pretty wildly between the model without over-dispersion 
to the current one:

 > summary(drk.glmer)
Generalized linear mixed model fit by the Laplace approximation
Formula: drinks ~ weekday * gender + (1 | id)
    Data: drink.df
     AIC    BIC logLik deviance
  146572 146706 -73271   146542
Random effects:
  Groups Name        Variance Std.Dev.
  id     (Intercept) 0.92314  0.9608
Number of obs: 56199, groups: id, 980

Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)
(Intercept)              -1.23639    0.04863  -25.42  < 2e-16 ***
weekdayMonday            -0.01035    0.03331   -0.31    0.756
weekdayTuesday            0.02583    0.03304    0.78    0.434
weekdayWednesday         -0.01161    0.03341   -0.35    0.728
weekdayThursday           0.50049    0.02976   16.82  < 2e-16 ***
weekdayFriday             1.12677    0.02691   41.87  < 2e-16 ***
weekdaySaturday           1.05954    0.02709   39.11  < 2e-16 ***
genderM                   0.30064    0.07099    4.24 2.28e-05 ***
weekdayMonday:genderM     0.36970    0.04359    8.48  < 2e-16 ***
weekdayTuesday:genderM    0.29552    0.04358    6.78 1.19e-11 ***
weekdayWednesday:genderM  0.36295    0.04378    8.29  < 2e-16 ***
weekdayThursday:genderM   0.45315    0.03913   11.58  < 2e-16 ***
weekdayFriday:genderM     0.40290    0.03587   11.23  < 2e-16 ***
weekdaySaturday:genderM   0.29764    0.03624    8.21  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

These data in all likelihood are probably best modeled by a two-part 
model, with zero vs. non-zero and count model for non-zeroes.  The 
current version of MCMCglmm allows for zero-inflated models (where 
zeroes are a mixture of a point mass and count distribution), and the 
development version has hurdle formulations (with zero vs. non-zero, and 
then truncated count distribution for non-zeroes).

cheers, Dave

John Maindonald wrote:

I think it more accurate to say that, in general, there may be
a class of distributions, and therefore a possible multiplicity
of likelihoods, not necessarily for distributions of exponential
form.  This is a PhD thesis asking to be done, or maybe
someone has already done it.

Over-dispersed distributions, where it is entirely clear what the
distribution is, can be generated as GLM model +  one random
effect per observation.  We have discussed this before.  This
seems to me the preferred way to go, if such a model seems to
fit the data.  I've not checked the current state of play re fitting
such models in lme4 of lme4a; in the past some versions have
allowed such a model.

I like the simplicity of the one random effect per observation
approach, as against what can seem the convoluted theoretical
framework in which beta binomials live.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

 >
 > On 25/06/2010, at 3:59 AM, Jeffrey Evans wrote:
 >
 >> Since I am definitely *not* a mathematician, I am straying in over 
my head
 >> here.
 >>
 >> I understand what you are saying - that there isn't a likelihood 
function
 >> for the quasi-binomial "distribution". And therefore, there is no-such
 >> distribution.
 >>
 >> What do you think of the suggestion that a beta-binomial mixture
 >> distribution could be used to model overdispersed binomial data?
 >>
 >> Would this be a techinically correct and logistically feasibile 
solution?
 >>
 >> -jeff
 >>
 >> -----Original Message-----
 >> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf 
Of Douglas
 >> Bates
 >> Sent: Thursday, June 24, 2010 1:25 PM
 >> To: Jeffrey Evans
 >> Cc: r-sig-mixed-models at r-project.org
 >> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions 
(again)
 >>
 >> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
 >> <Jeffrey.Evans at dartmouth.edu> wrote:
 >>> Like others, I have experienced trouble with estimation of the scale
 >>> parameter using the quasi-distributions in lme4, which is necessary to
 >>> calculate QAICc and rank overdispersed generalized linear mixed models.
 >>
 >>> I had several exchanges with Ben Bolker about this early last year
 >>> after his TREE paper came out
 >>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
 >>> %29000 19-6), and I know it's been discussed on on this list. Has
 >>> there been or is there any potential resolution to this forthcoming in
 >>> future releases of
 >>> lme4 or lme4a? I run into overdispersed binomial distributions
 >>> frequently and have had to use SAS to deal with them. SAS appears to
 >>> work, but it won't estimate the overdispersion parameter using laplace
 >>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods
 >>> can't be used for model ranking. I don't know why SAS can't/won't, but
 >>> lme4 will run these quasi-binomial and quasi-poisson distributions with
 >> Laplace estimation.
 >>
 >>> Is there a workable way to use lme4 for modeling overdispersed
 >>> binomial data?
 >>
 >> I have trouble discussing this because I come from a background as a
 >> mathematician and am used to tracing derivations back to the original
 >> definitions.  So when I think of a likelihood (or, equivalently, a
 >> deviance) to be optimized it only makes sense to me if there is a
 >> probability distribution associated with the model.  And for the
 >> quasi-binomial and quasi-Poisson families, there isn't a probability
 >> distribution.  To me that means that discussing maximum likelihood
 >> estimators for such models is nonsense.  The models simply do not exist.
 >> One can play tricks in the case of a generalized linear model to 
estimate a
 >> "quasi-parameter" that isn't part of the probability distribution 
but it is
 >> foolhardy to expect that the tricks will automatically carry over to a
 >> generalized linear mixed model.
 >>
 >> I am not denying that data that are over-dispersed with respect to the
 >> binomial or Poisson distributions can and do occur.  But having data 
like
 >> this and a desire to model it doesn't make the quasi families real. 
  In his
 >> signature Thierry Onkelinx quotes
 >>
 >> The combination of some data and an aching desire for an answer does not
 >> ensure that a reasonable answer can be extracted from a given body 
of data.
 >> ~ John Tukey
 >>
 >> I could and do plan to incorporate the negative binomial family but, 
without
 >> a definition that I can understand of a quasi-binomial or quasi-Poisson
 >> distribution and its associated probability function, I'm stuck. To 
me it's
 >> a "build bricks without straw" situation - you can't find maximum 
likelihood
 >> estimates for parameters that aren't part of the likelihood.
 >>
 >> _______________________________________________
 >> R-sig-mixed-models at r-project.org mailing list
 >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
 >

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

-- 
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)		
1100 NE 45th Street, Suite 300 	
Seattle, WA  98105 	
206-616-3879 	
http://depts.washington.edu/cshrb/
(Mon-Wed)	

Center for Healthcare Improvement, for Addictions, Mental Illness,
   Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104
http://www.chammp.org
(Thurs)