[R-sig-ME] Binomial vs. logistic regression & the consequences of aggregation

David Winsemius dwinsemius at comcast.net
Tue Sep 20 04:55:53 CEST 2011


On Sep 19, 2011, at 6:44 PM, Jeremy Koster wrote:

> Imagine that I have observed 100 people on 50 separate occasions.   
> For each observation, I record whether they are smoking or not.  I  
> am interested in modeling the effect of age on the likelihood of  
> smoking.
>
> I could envision two ways of doing this, leaving the data in an  
> unaggregated format -- that is, a dataset with 5000 rows.  Then  
> specify a model with a random effect for individual, such as:
>
> smoking.logistic <- glmer (smoking ~ age + (1|Individual), family =  
> binomial)
>
> Alternatively, a colleague routinely aggregates data for each  
> individual, thus producing a dataset of 100 rows.  He then models  
> the effect of age by writing code:
>
> smoking.binomial <- glm (cbind(smoking observations, total  
> observations) ~ age, family = binomial)

Wouldn't this model ignore the lack of indepedence in the observations  
and unfairly inflate the confidence in the estimate, since that is an  
ordinary grouped data input for logistic regression? I would have  
guessed that it would be more appropriately constructed as:

smk.pois <- glm((sum(smoking_obs) ~ age +offset(log(sum(total_obs))),  
family =poisson)

That way you have one observation per individual and the response is  
on the correct discrete scale.

>
> I find this approach to be less intuitive, and I note that we get  
> very different results when switching from one to the other.  I lack  
> the statistical expertise to articulate the difference in the  
> estimation of these models, and I would appreciate references that  
> detail the consequences of using the different approaches.   
> Specifically, to what extent does the aggregation within individuals  
> obviate the need (if at all) for an individual-level random effect?

I do not think that the second one does actually aggregate within  
indivduals, since you were using the binomial family.

>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

David Winsemius, MD
West Hartford, CT




More information about the R-sig-mixed-models mailing list