[R-sig-ME] advice on grouping structure - many levels but few individuals per level

Thu Apr 10 12:13:29 CEST 2008

On 09/04/2008, at 11:47 PM, Douglas Bates wrote:
> On Wed, Apr 9, 2008 at 5:29 AM, Martin Matejus <mmatejus at googlemail.com 
> > wrote:
>> Dear lmer's
>
>> I was hoping to get a little advice about specifying a grouping  
>> structure
>> with many levels but few (sometimes one) individual per level. I  
>> have had a
>> look through the posting archives but could not find a similar  
>> question.
>> Many apologies in advance if I have missed any.
>
>> The context of the question is as follows:
>
>> I would like to model fitness of juvenile birds (a simple weight  
>> based
>> metric) with a number of explanatory variables including; when they  
>> were
>> layed (as a Julian day - egglayed), number of nestlings in nest  
>> (nestlings)
>> and whether they are male or female (sex). Each bird obviously  
>> originates
>> from a nest with some birds originating from the same nest  
>> (siblings). As
>> there is the potential for the fitness of siblings to be similar  
>> (either due
>> to genetic or environmental effects) I would like to include nest  
>> as a
>> random effect to reflect this potential grouping structure. For  
>> example
>
>> model <- lmer(fitness ~ egglayed + nestlings + sex +(1|nest))
>
>> I have many nests (175) but about half of them contain only 1  
>> individual.
>
>> My question is: does it make sense to include nest as a random  
>> effect given
>> that many nests only contain one individual? I know this probably  
>> reflects a
>> rather deep misunderstanding regarding mixed effects models on my  
>> part but I
>> would have thought that it would be impossible to estimate a within  
>> nest
>> variance with only one individual and therefore make my between nest
>> variance estimates meaningless.
>
> That's not a problem as long as you recognize that you will get almost
> no new information from the groups that have only one observation. In
> other words you will get almost the same parameter estimates from the
> complete data set as you would get from the data after elimination
> those nests with only one individual.  If you wrote out all of the
> error terms for each observation you would see that for those nests
> with only one observation you have two confounded error terms.
>
> I have seen this effect when fitting models to the 'star' data set in
> the mlmRev package.  Because these are longitudinal data, groups are
> indexed by individuals (students, in this case)  and the number of
> observations per group is the number of times the student takes a
> test.  Many students have only one observation.  For most models you
> can remove those students or keep them in without affecting the
> parameter estimates noticeably.
>

This depends on the data. If the  within cluster correlation is high  
then a large cluster has little more information than a small cluster.  
In that case take out half the clusters and the standard errors will  
increase by 30% or more.

My suggestion is to leave all the data in, and fit as a random effects  
model as this will work fine. The original concern was that the within  
nest variance couldn't be calculated for clusters with single  
observations but this is not a problem.

Ken

>> Many, many thanks for your advice in advance.
>> Best wishes
>> Martin
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>