[R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?
bates at stat.wisc.edu
Wed May 20 19:16:03 CEST 2009
On Wed, May 20, 2009 at 12:08 PM, Robert ESPESSER
<Robert.Espesser at lpl-aix.fr> wrote:
> Dear all,
> I ran the model on the data "data" expanded in binary format, and I get the same results tahn
> in tabular format( with the correct number of observations).
> Does it mean that the sentence "the code is wrong" is relevant only for the calculs of number of observations ?
Yes, that is what I meant. Sorry for alarming you.
(Well, actually there is another sense in which the calculations on
the collapsed data give a different result. The calculated values of
the deviance, AIC and BIC are different for the extended and collapsed
data set and they should be the same.)
> Thank you for reassuring me!
> # dbin is the dataframe for the data in binary format
> id rep
> 3 :15 fail: 42
> 4 :14 suc :101
> 1 :13
> 9 :11
> 10 :10
> 2 :10
>> glmer(rep ~ 1 + (1 | id), family="binomial",data=dbin) -> m1.dbin
> Generalized linear mixed model fit by the Laplace approximation
> Formula: rep ~ 1 + (1 | id)
> Data: dbin
> AIC BIC logLik deviance
> 176.8 182.7 -86.38 172.8
> Random effects:
> Groups Name Variance Std.Dev.
> id (Intercept) 0.16121 0.40151
> Number of obs: 143, groups: id, 15
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 0.9058 0.2139 4.233 2.3e-05 ***
>> From: Douglas Bates <bates at stat.wisc.edu>
>> Sent: Wed May 20 14:24:16 CEST 2009
>> To: Dale Steele <dale.w.steele at gmail.com>
>> Subject: Re: [R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?
>> On Tue, May 19, 2009 at 6:10 PM, Dale Steele <dale.w.steele at gmail.com> wrote:
>> > A recent thread
>> > <https://stat.ethz.ch/pipermail/r-help/2009-April/194750.html>
>> > suggested the glmer can handle data in tabular form. I'm working
>> > through (not homework) the simple random effects example from Agresti,
>> > An Introduction to Categorical Data Analysis, 2nd Ed. (pg 303 - 304)
>> > and am having problems...
>> > # Data from Table 10.2 n_i: number of free throws, p_i: observed
>> > proportion of successes
>> > id <- seq(1:15)
>> > n <- c(13,10,15,14,6,10,10,4,11,10,8,9,9,8,6)
>> > p <- c(.769, .9, .667, .643, .667, .9, .6, 1, .545, .9, .5, .889,
>> > .778, .625, .167)
>> > (success <- round(n * p))
>> > fail <- n - success
>> > data <- cbind(success, fail)
>> > # Model to fit: logit(pi_i) = mu_i + alpha
>> > library(lme4)
>> > # Model
>> > m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
>> > summary(m1)
>> nAGQ = 50 is overkill in this case but the model is sufficiently
>> simple that it doesn't do much harm. Gauss-Hermite quadrature without
>> centering and scaling (which is the "adaptive" part) could require a
>> huge number of quadrature points because most of the evaluations of
>> the unscaled density are wasted. In adaptive Gauss-Hermite
>> quadrature, however, you get very close approximations for a small
>> number of evaluations. I would rarely use more than nAGQ = 9. (Also,
>> the number of quadrature points is always chosen to be odd so your 50
>> will be increased to 51. With an odd number of points you get one
>> evaluation free - the one at a displacement of zero.)
>> > The code runs, and produces results similar, but not exactly what
>> > Agresti gets (alpha_hat=0.908 and sigma_hat=0.422). Shouldn't the
>> > 'number of obs' be 143 rather than 15? Am I doing something wrong?
>> No, it's the code that is wrong. At last summer's UseR Conference
>> Andrew Gelman said that he was looking at the code for the glm
>> function and he was frustrated that the code was so long and involved
>> to be able to handle all the special cases. This is a prime example
>> of why the code gets complicated.
>> I know that it is convenient and natural to want to list the number of
>> successes and failures instead of one row for each observation but
>> doing that requires a massive amount of really ugly code to get it
>> right. (Look at the definitions of the various glm families some day.
>> There is this very peculiar "n" object squirreled away in a common
>> environment of the family functions specifically for this one case -
>> it is never used otherwise and, in fact, wasn't even defined in other
>> cases until recently.)
>> All the structures in lmer/glmer/nlmer are based on the not
>> unreasonable assumption that the model frame is a model frame, meaning
>> that there is one observation per row. Except that isn't the case
>> here. Getting the count right would mean writing a large amount of
>> special case code to recognize this case and regenerate the original
>> number of observations. It can be done but it is messy and low on the
>> priority list right now.
>> > > summary(m1)
>> >> m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
>> >> summary(m1)
>> > Generalized linear mixed model fit by the adaptive Gaussian Hermite
>> > approximation
>> > Formula: data ~ 1 + (1 | id)
>> > AIC BIC logLik deviance
>> > 32.8 34.21 -14.4 28.8
>> > Random effects:
>> > Groups Name Variance Std.Dev.
>> > id (Intercept) 0.16258 0.40321
>> > Number of obs: 15, groups: id, 15
>> > Fixed effects:
>> > Estimate Std. Error z value Pr(>|z|)
>> > (Intercept) 0.9059 0.2142 4.23 2.34e-05 ***
>> > I tried to run the same model by expanding the tabular data. Easy
>> > enough to expand the id variable.
>> > #
>> > (subj <- rep(id,n))
>> > However, I'm stuck on how to expand outcome variable for each player
>> > and concatenate as a single vector...
>> > # first player
>> > oc1 <- rep(c(1,0), sf[1,])
>> > Appreciate any insight anyone may provide on how to use tabular data
>> > directly and how to expand tubular data. Thanks! --Dale
>> I enclose a modified version of your script to do that. The general
>> idea is to produce all the 1 responses, then produce all the zero
>> responses then rbind them.
> R-sig-mixed-models at r-project.org mailing list
More information about the R-sig-mixed-models