[R-sig-ME] Specifying outcome variable in binomial glmm: single responses vs cbind?

Sat Jul 2 19:06:30 CEST 2016

On 16-07-01 07:37 PM, a y wrote:
> What is the difference between fitting a binomial glmm (without random item
> effects) in the following two ways?
> 
> 1.
> Data formatted in the following way:
> 
> (data_long)
> ID    correct    condition    itemID
> 1      1             A               i1
> 1      0             A               i2
> 1      1             A               i3
> 1      1             A               i4
> 2      0             B               i1
> 2      1             B               i2
> 2      1             B               i3
> 2      0             B               i4
> 
> Fitting a model without item random effects:
> 
> glmer(correct ~ condition + (1|ID), family = binomial, data = data_long)
> 
> 
> 2.
> Data formatted this way (summing over the correct responses):
> 
> (data_short)
> ID     sum_correct    condition     itemID
> 1       3                      A                NA
> 2       2                      B                NA
> 
> Fitting the following model, assuming there were only 4 items  (I've seen
> dozens of examples like this):
> glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID), family =
> binomial, data = data_short)
> 
> ---
> I figured these models should be identical, but in my experience they are
> very much not. What am I missing? When is the second (more) appropriate?
> 
> Thanks for any help,
> Andrew
> 

  I believe they should give different likelihoods but identical
parameter estimates, *differences* among likelihoods (i.e. among
competing models fitted with the same data), etc..  That is,
disaggregating the data leads to an extra additive constant in the
log-likelihood. I would be very interested to see a counter-example to
that statement!  In general, the second form should be quicker to fit,
provide residuals that are easier to interpret, etc..