[R-sig-ME] advice on grouping structure - many levels but few individuals per level
Ken Beath
kjbeath at kagi.com
Thu Apr 10 12:13:29 CEST 2008
On 09/04/2008, at 11:47 PM, Douglas Bates wrote:
> On Wed, Apr 9, 2008 at 5:29 AM, Martin Matejus <mmatejus at googlemail.com
> > wrote:
>> Dear lmer's
>
>> I was hoping to get a little advice about specifying a grouping
>> structure
>> with many levels but few (sometimes one) individual per level. I
>> have had a
>> look through the posting archives but could not find a similar
>> question.
>> Many apologies in advance if I have missed any.
>
>> The context of the question is as follows:
>
>> I would like to model fitness of juvenile birds (a simple weight
>> based
>> metric) with a number of explanatory variables including; when they
>> were
>> layed (as a Julian day - egglayed), number of nestlings in nest
>> (nestlings)
>> and whether they are male or female (sex). Each bird obviously
>> originates
>> from a nest with some birds originating from the same nest
>> (siblings). As
>> there is the potential for the fitness of siblings to be similar
>> (either due
>> to genetic or environmental effects) I would like to include nest
>> as a
>> random effect to reflect this potential grouping structure. For
>> example
>
>> model <- lmer(fitness ~ egglayed + nestlings + sex +(1|nest))
>
>> I have many nests (175) but about half of them contain only 1
>> individual.
>
>> My question is: does it make sense to include nest as a random
>> effect given
>> that many nests only contain one individual? I know this probably
>> reflects a
>> rather deep misunderstanding regarding mixed effects models on my
>> part but I
>> would have thought that it would be impossible to estimate a within
>> nest
>> variance with only one individual and therefore make my between nest
>> variance estimates meaningless.
>
> That's not a problem as long as you recognize that you will get almost
> no new information from the groups that have only one observation. In
> other words you will get almost the same parameter estimates from the
> complete data set as you would get from the data after elimination
> those nests with only one individual. If you wrote out all of the
> error terms for each observation you would see that for those nests
> with only one observation you have two confounded error terms.
>
> I have seen this effect when fitting models to the 'star' data set in
> the mlmRev package. Because these are longitudinal data, groups are
> indexed by individuals (students, in this case) and the number of
> observations per group is the number of times the student takes a
> test. Many students have only one observation. For most models you
> can remove those students or keep them in without affecting the
> parameter estimates noticeably.
>
This depends on the data. If the within cluster correlation is high
then a large cluster has little more information than a small cluster.
In that case take out half the clusters and the standard errors will
increase by 30% or more.
My suggestion is to leave all the data in, and fit as a random effects
model as this will work fine. The original concern was that the within
nest variance couldn't be calculated for clusters with single
observations but this is not a problem.
Ken
>> Many, many thanks for your advice in advance.
>> Best wishes
>> Martin
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list