[R-sig-ME] Does glm (and glmer) no longer vectorize proportions when estimating a binomial GLM(M)?

Douglas Bates bates at stat.wisc.edu
Sat Apr 7 16:29:18 CEST 2012


On Thu, Apr 5, 2012 at 4:48 PM, Jeremy Koster <helixed2 at yahoo.com> wrote:
> A while ago, I posted this message to the listserv:
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q3/006717.html
>
> Specifically, I had noted that this code:
>
> smoking.aggregated <- glmer (cbind(smoking observations, total observations) ~ AGE + (1|Individual), family = binomial, data = aggregated)
>
> generates the same estimates as this code, which simply uses an unaggregated vector of data with a binary outcome variable instead of the proportions via cbind:
>
>
> smoking.unaggregated <- glmer (smoking ~ AGE + (1|Individual), family = binomial, data = unaggregated)
>
>
> In response, Doug Bates described the underlying code and functions as a bit of a "hack" -- see: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q3/006724.html

That comment was with respect to the ways that generalized linear
model families are defined in R.  There are several functions in a
family that are mis-named for historical reasons.  Initial short-cuts
were later found not to be sufficiently general and other functions to
be added to the family.  The distinction between Bernoulli and
binomial responses is to some extent determined by the weights but
then there may be both prior weights and a binomial representation of
Bernoulli responses so this mysterious 'n' vector was added, etc.

The misnomers are:
  - the dev.resid function which is documented to return the deviance
residuals doesn't.  It returns the square of the deviance residuals.
  - the aic function doesn't return the AIC, it returns the deviance.

Once you filter through all this misdirection there is still a
difference in the deviance between the Bernoulli representation of the
data and the representation as binomial responses.  The deviance from
the Bernoulli representation is based on the likelihood of the
parameters for the particular order of the observations in the data.
The deviance from the binomial representation is based on the
likelihood for any of the n_i choose k_i possible orderings of
responses as described in the binomial summary.

At one point I thought that the deviance for the two representations
should be the same but now I have convinced myself that there is a
good reason for them to be different.

> Well, I no longer get the same estimates when using the two above lines of code, which along with Doug's comment makes me wonder if subsequent versions of the base and lme4 packages now treat these models differently.

Can you provide an example where you get different parameter
estimates?  As far as I know the parameter estimates should be the
same, it is just the log-likelihood and quantities derived from it
(deviance, AIC, BIC) that are different.




More information about the R-sig-mixed-models mailing list