[R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?
Douglas Bates
bates at stat.wisc.edu
Wed May 20 14:24:16 CEST 2009
On Tue, May 19, 2009 at 6:10 PM, Dale Steele <dale.w.steele at gmail.com> wrote:
> A recent thread
> <https://stat.ethz.ch/pipermail/r-help/2009-April/194750.html>
> suggested the glmer can handle data in tabular form. I'm working
> through (not homework) the simple random effects example from Agresti,
> An Introduction to Categorical Data Analysis, 2nd Ed. (pg 303 - 304)
> and am having problems...
> # Data from Table 10.2 n_i: number of free throws, p_i: observed
> proportion of successes
> id <- seq(1:15)
> n <- c(13,10,15,14,6,10,10,4,11,10,8,9,9,8,6)
> p <- c(.769, .9, .667, .643, .667, .9, .6, 1, .545, .9, .5, .889,
> .778, .625, .167)
> (success <- round(n * p))
> fail <- n - success
> data <- cbind(success, fail)
> # Model to fit: logit(pi_i) = mu_i + alpha
> library(lme4)
> # Model
> m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
> summary(m1)
nAGQ = 50 is overkill in this case but the model is sufficiently
simple that it doesn't do much harm. Gauss-Hermite quadrature without
centering and scaling (which is the "adaptive" part) could require a
huge number of quadrature points because most of the evaluations of
the unscaled density are wasted. In adaptive Gauss-Hermite
quadrature, however, you get very close approximations for a small
number of evaluations. I would rarely use more than nAGQ = 9. (Also,
the number of quadrature points is always chosen to be odd so your 50
will be increased to 51. With an odd number of points you get one
evaluation free - the one at a displacement of zero.)
> The code runs, and produces results similar, but not exactly what
> Agresti gets (alpha_hat=0.908 and sigma_hat=0.422). Shouldn't the
> 'number of obs' be 143 rather than 15? Am I doing something wrong?
No, it's the code that is wrong. At last summer's UseR Conference
Andrew Gelman said that he was looking at the code for the glm
function and he was frustrated that the code was so long and involved
to be able to handle all the special cases. This is a prime example
of why the code gets complicated.
I know that it is convenient and natural to want to list the number of
successes and failures instead of one row for each observation but
doing that requires a massive amount of really ugly code to get it
right. (Look at the definitions of the various glm families some day.
There is this very peculiar "n" object squirreled away in a common
environment of the family functions specifically for this one case -
it is never used otherwise and, in fact, wasn't even defined in other
cases until recently.)
All the structures in lmer/glmer/nlmer are based on the not
unreasonable assumption that the model frame is a model frame, meaning
that there is one observation per row. Except that isn't the case
here. Getting the count right would mean writing a large amount of
special case code to recognize this case and regenerate the original
number of observations. It can be done but it is messy and low on the
priority list right now.
> > summary(m1)
>> m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
>> summary(m1)
>
> Generalized linear mixed model fit by the adaptive Gaussian Hermite
> approximation
> Formula: data ~ 1 + (1 | id)
> AIC BIC logLik deviance
> 32.8 34.21 -14.4 28.8
> Random effects:
> Groups Name Variance Std.Dev.
> id (Intercept) 0.16258 0.40321
> Number of obs: 15, groups: id, 15
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 0.9059 0.2142 4.23 2.34e-05 ***
>
> I tried to run the same model by expanding the tabular data. Easy
> enough to expand the id variable.
> #
> (subj <- rep(id,n))
>
> However, I'm stuck on how to expand outcome variable for each player
> and concatenate as a single vector...
> # first player
> oc1 <- rep(c(1,0), sf[1,])
>
> Appreciate any insight anyone may provide on how to use tabular data
> directly and how to expand tubular data. Thanks! --Dale
I enclose a modified version of your script to do that. The general
idea is to produce all the 1 responses, then produce all the zero
responses then rbind them.
More information about the R-sig-mixed-models
mailing list