[R-sig-ME] advice on grouping structure - many levels but few individuals per level

Thu Apr 10 00:14:44 CEST 2008

Doug,

On Wed, Apr 09, 2008 at 08:47:27AM -0500, Douglas Bates wrote:
> On Wed, Apr 9, 2008 at 5:29 AM, Martin Matejus <mmatejus at googlemail.com> wrote:
> > Dear lmer's
> 
> >  I was hoping to get a little advice about specifying a grouping structure
> >  with many levels but few (sometimes one) individual per level. I have had a
> >  look through the posting archives but could not find a similar question.
> >  Many apologies in advance if I have missed any.
> 
> >  The context of the question is as follows:
> 
> >  I would like to model fitness of juvenile birds (a simple weight based
> >  metric) with a number of explanatory variables including; when they were
> >  layed (as a Julian day - egglayed), number of nestlings in nest (nestlings)
> >  and whether they are male or female (sex). Each bird obviously originates
> >  from a nest with some birds originating from the same nest (siblings). As
> >  there is the potential for the fitness of siblings to be similar (either due
> >  to genetic or environmental effects) I would like to include nest as a
> >  random effect to reflect this potential grouping structure. For example
> 
> >  model <- lmer(fitness ~ egglayed + nestlings + sex +(1|nest))
> 
> >  I have many nests (175) but about half of them contain only 1 individual.
> 
> >  My question is: does it make sense to include nest as a random effect given
> >  that many nests only contain one individual? I know this probably reflects a
> >  rather deep misunderstanding regarding mixed effects models on my part but I
> >  would have thought that it would be impossible to estimate a within nest
> >  variance with only one individual and therefore make my between nest
> >  variance estimates meaningless.
> 
> That's not a problem as long as you recognize that you will get almost
> no new information from the groups that have only one observation. In
> other words you will get almost the same parameter estimates from the
> complete data set as you would get from the data after elimination
> those nests with only one individual.  If you wrote out all of the
> error terms for each observation you would see that for those nests
> with only one observation you have two confounded error terms.
> 
> I have seen this effect when fitting models to the 'star' data set in
> the mlmRev package.  Because these are longitudinal data, groups are
> indexed by individuals (students, in this case)  and the number of
> observations per group is the number of times the student takes a
> test.  Many students have only one observation.  For most models you
> can remove those students or keep them in without affecting the
> parameter estimates noticeably.

Do you mean all those unidatum students at once, or one at a time?
Presumably that also depends on the multivariate distribution of the
observations.

Andrew

> >  Many, many thanks for your advice in advance.
> >  Best wishes
> >  Martin
> >
> >         [[alternative HTML version deleted]]
> >
> >  _______________________________________________
> >  R-sig-mixed-models at r-project.org mailing list
> >  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/