[R-sig-ME] Binomial vs. logistic regression & the consequences of aggregation
David Winsemius
dwinsemius at comcast.net
Tue Sep 20 04:55:53 CEST 2011
On Sep 19, 2011, at 6:44 PM, Jeremy Koster wrote:
> Imagine that I have observed 100 people on 50 separate occasions.
> For each observation, I record whether they are smoking or not. I
> am interested in modeling the effect of age on the likelihood of
> smoking.
>
> I could envision two ways of doing this, leaving the data in an
> unaggregated format -- that is, a dataset with 5000 rows. Then
> specify a model with a random effect for individual, such as:
>
> smoking.logistic <- glmer (smoking ~ age + (1|Individual), family =
> binomial)
>
> Alternatively, a colleague routinely aggregates data for each
> individual, thus producing a dataset of 100 rows. He then models
> the effect of age by writing code:
>
> smoking.binomial <- glm (cbind(smoking observations, total
> observations) ~ age, family = binomial)
Wouldn't this model ignore the lack of indepedence in the observations
and unfairly inflate the confidence in the estimate, since that is an
ordinary grouped data input for logistic regression? I would have
guessed that it would be more appropriately constructed as:
smk.pois <- glm((sum(smoking_obs) ~ age +offset(log(sum(total_obs))),
family =poisson)
That way you have one observation per individual and the response is
on the correct discrete scale.
>
> I find this approach to be less intuitive, and I note that we get
> very different results when switching from one to the other. I lack
> the statistical expertise to articulate the difference in the
> estimation of these models, and I would appreciate references that
> detail the consequences of using the different approaches.
> Specifically, to what extent does the aggregation within individuals
> obviate the need (if at all) for an individual-level random effect?
I do not think that the second one does actually aggregate within
indivduals, since you were using the binomial family.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
David Winsemius, MD
West Hartford, CT
More information about the R-sig-mixed-models
mailing list