[R-sig-ME] Binomial vs. logistic regression & the consequences of aggregation
Jeremy Koster
helixed2 at yahoo.com
Tue Sep 20 00:44:10 CEST 2011
Imagine that I have observed 100 people on 50 separate occasions. For each observation, I record whether they are smoking or not. I am interested in modeling the effect of age on the likelihood of smoking.
I could envision two ways of doing this, leaving the data in an unaggregated format -- that is, a dataset with 5000 rows. Then specify a model with a random effect for individual, such as:
smoking.logistic <- glmer (smoking ~ age + (1|Individual), family = binomial)
Alternatively, a colleague routinely aggregates data for each individual, thus producing a dataset of 100 rows. He then models the effect of age by writing code:
smoking.binomial <- glm (cbind(smoking observations, total observations) ~ age, family = binomial)
I find this approach to be less intuitive, and I note that we get very different results when switching from one to the other. I lack the statistical expertise to articulate the difference in the estimation of these models, and I would appreciate references that detail the consequences of using the different approaches. Specifically, to what extent does the aggregation within individuals obviate the need (if at all) for an individual-level random effect?
More information about the R-sig-mixed-models
mailing list