[R-sig-ME] GEE vs. mixed-effects modeling for handling singletons in higher-level units

Fri Apr 8 15:34:32 CEST 2011

On Thu, Apr 7, 2011 at 8:57 PM, Jeremy Koster <helixed2 at yahoo.com> wrote:
> A statistically-minded colleague recently commented that, in datasets that have singletons in higher-level units, Generalized Estimating Equations might give more accurate estimates of fixed effects than random-effects modeling.
>
> The conversation emerged from the observation that some (anthropological) datasets include households with only one member.  That is, we have encountered datasets in which 250 households include more than one person whereas about 10-15 are single-person households.
>
> Are there pros (and cons) to either GEE or random-effects modeling in such cases?
>
> More generally, can anyone recommend references to literature on singletons in the context of mixed-effects modeling?  My impression is that it's still possible to specify random effects when some higher-level units include only one lower-level data point, but my colleague believes that it would be problematic.

It's possible to fit models with random effects to data sets with
singletons as you describe.  You must be aware, however, that such
units contribute little information because the variability in the
response is being modeled in two different ways.

The guPrenat data set in the mlmRev package for R is such an example.
In most of the families (the family factor is called "mom") there is
only one child observed.  (There may be, and usually is, more than one
child in the family but the data are recorded for only one of the
children.)

> with(guPrenat, table(table(mom)))

  1   2   3   4
817 595 142   4

Adding a random effect for the family as well as the district produces
a model that can be fit and does have a substantial variability
associated with the family.  However, if you examine the prediction
intervals on the random effects closely you will find that the random
effects for those families with only one child observed have very wide
prediction intervals, centered close to zero.