[R-sig-ME] Specifying outcome variable in binomial glmm: single responses vs cbind?

Mon Jul 4 20:11:55 CEST 2016

Hi Ben,
This thread is relevant in this regard:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q4/024241.html
At least on my machine, I found a substantial difference in the parameter
estimates. The second form seemed more reliable than the first, as you'll
see from the thread.
Do you get the same result?
Best wishes,
Malcolm

Date: Sat, 2 Jul 2016 13:06:30 -0400
> From: Ben Bolker <bbolker at gmail.com>
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Specifying outcome variable in binomial glmm:
>         single responses vs cbind?
>
>
>
> On 16-07-01 07:37 PM, a y wrote:
> > What is the difference between fitting a binomial glmm (without random
> item
> > effects) in the following two ways?
> >
> > 1.
> > Data formatted in the following way:
> >
> > (data_long)
> > ID    correct    condition    itemID
> > 1      1             A               i1
> > 1      0             A               i2
> > 1      1             A               i3
> > 1      1             A               i4
> > 2      0             B               i1
> > 2      1             B               i2
> > 2      1             B               i3
> > 2      0             B               i4
> >
> > Fitting a model without item random effects:
> >
> > glmer(correct ~ condition + (1|ID), family = binomial, data = data_long)
> >
> >
> > 2.
> > Data formatted this way (summing over the correct responses):
> >
> > (data_short)
> > ID     sum_correct    condition     itemID
> > 1       3                      A                NA
> > 2       2                      B                NA
> >
> > Fitting the following model, assuming there were only 4 items  (I've seen
> > dozens of examples like this):
> > glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID), family =
> > binomial, data = data_short)
> >
> > ---
> > I figured these models should be identical, but in my experience they are
> > very much not. What am I missing? When is the second (more) appropriate?
> >
> > Thanks for any help,
> > Andrew
> >
>
>   I believe they should give different likelihoods but identical
> parameter estimates, *differences* among likelihoods (i.e. among
> competing models fitted with the same data), etc..  That is,
> disaggregating the data leads to an extra additive constant in the
> log-likelihood. I would be very interested to see a counter-example to
> that statement!  In general, the second form should be quicker to fit,
> provide residuals that are easier to interpret, etc..
>

	[[alternative HTML version deleted]]