[R-sig-ME] advice on grouping structure - many levels but few individuals per level

Douglas Bates bates at stat.wisc.edu
Wed Apr 9 15:47:27 CEST 2008

On Wed, Apr 9, 2008 at 5:29 AM, Martin Matejus <mmatejus at googlemail.com> wrote:
> Dear lmer's

>  I was hoping to get a little advice about specifying a grouping structure
>  with many levels but few (sometimes one) individual per level. I have had a
>  look through the posting archives but could not find a similar question.
>  Many apologies in advance if I have missed any.

>  The context of the question is as follows:

>  I would like to model fitness of juvenile birds (a simple weight based
>  metric) with a number of explanatory variables including; when they were
>  layed (as a Julian day - egglayed), number of nestlings in nest (nestlings)
>  and whether they are male or female (sex). Each bird obviously originates
>  from a nest with some birds originating from the same nest (siblings). As
>  there is the potential for the fitness of siblings to be similar (either due
>  to genetic or environmental effects) I would like to include nest as a
>  random effect to reflect this potential grouping structure. For example

>  model <- lmer(fitness ~ egglayed + nestlings + sex +(1|nest))

>  I have many nests (175) but about half of them contain only 1 individual.

>  My question is: does it make sense to include nest as a random effect given
>  that many nests only contain one individual? I know this probably reflects a
>  rather deep misunderstanding regarding mixed effects models on my part but I
>  would have thought that it would be impossible to estimate a within nest
>  variance with only one individual and therefore make my between nest
>  variance estimates meaningless.

That's not a problem as long as you recognize that you will get almost
no new information from the groups that have only one observation. In
other words you will get almost the same parameter estimates from the
complete data set as you would get from the data after elimination
those nests with only one individual.  If you wrote out all of the
error terms for each observation you would see that for those nests
with only one observation you have two confounded error terms.

I have seen this effect when fitting models to the 'star' data set in
the mlmRev package.  Because these are longitudinal data, groups are
indexed by individuals (students, in this case)  and the number of
observations per group is the number of times the student takes a
test.  Many students have only one observation.  For most models you
can remove those students or keep them in without affecting the
parameter estimates noticeably.

>  Many, many thanks for your advice in advance.
>  Best wishes
>  Martin
>         [[alternative HTML version deleted]]
>  _______________________________________________
>  R-sig-mixed-models at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

More information about the R-sig-mixed-models mailing list