[R-sig-ME] Sample size and mixed models

Andrew Robinson A.Robinson at ms.unimelb.edu.au
Sun Dec 14 05:25:24 CET 2008

On Sat, Dec 13, 2008 at 10:37:22PM -0500, Robert Kushler wrote:
> Section 4.5.3 of Agresti's Categorical Data Analysis (pages 140-141 of the
> second edition) discusses "grouped vs ungrouped" for the binomial case.
> The same issue arises in Poisson models for count data.  All individuals
> with the same "covariate pattern" can be collapsed to a single record, with
> the sum of the counts as the response and an offset for the sample size, and
> the same fitted model is obtained.
> Agresti refers to two different versions of the "saturated" model, but I
> like to reserve the term "saturated" for the model that fits the grouped
> data perfectly and call the other the "perfect" model (since it predicts
> all the individuals correctly).
> Nagelkerke's R^2 will be larger when computed using the grouped data
> likelihood, but that's because the "saturated" model is the definition
> of perfection in that case.  This is analogous to defining the model
> "y ~ factor(x)" as perfect when assessing "y ~ x" - you're throwing away
> the "within groups" sum of squares and treating the "between groups" sum
> of squares as the total.

I see your logic, but the original question referred to the sample
size, rather than the likelihood.  That was why I was surprised: using
the ungrouped sample size fails to account for within-cluster
> Strictly speaking, the choice of which version of n to use should
> probably not be made independently of this issue.  If the count of
> individuals is used with the grouped data likelihood it reduces the
> amount by which the R^2 value is inflated, which is my (admittedly
> weak) reason for the blanket recommendation.

My tentative prescription would be conservative: the ungrouped loss
function (however determined) and the grouped sample size.  It's a
pretty ugly hack, though.



Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599

More information about the R-sig-mixed-models mailing list