[R-sig-ME] unbalanced data in nested lmer model

Mon Mar 29 19:55:01 CEST 2010

Hello,

as Andrew and others already explained:

> There is more than 500 cases

Fine, this might give you reasonable estimates about how your y is affected 
by your fixed effects covariates (x1,x2,...)

> (...) on 8 farms in 6 regions.
#and, from your previos post
>For 2 of 8 regions there is only 1 farm, the other regions have 2 farms.

thus no way to estimate a difference between region or farm effects for 2 
regions, and very, very limited power for the other 6 (just 2 farms per 
region). To make things worse your data are also quite unbalanced:

>unbalance of case numbers in cells? Or would it be no problem if cell sizes 
>vary between 0 and 53?

which I think means for some farms you got only one record? Anyway, to 
recap, probably OK data for understanding y~x1+x2 etc., insufficient data 
otherwise (should invest in getting data for more farms within regions, not 
more data for the farms you have already sampled).

> Moreover I don't understand your argument that fitting random efects with 
> less than 5 levels was dodgy, as often examples in the books have 3 
> samples from one beach, or 3 laboratory workers within one laboratory. 
> These are less than 5 levels, are they not?

These are usually toy datasets to exemplify how the approach works, I do not 
think they make a claim that the resulting variance estimates are very 
reliable (think in the Zuur etal. mixed effects book you can find more 
realistic examples, if I remember well). Plus, "level" refers to the number 
of beaches or the number of labs etc. and the resulting variance estimates - 
if less than say 5 it appears that you might be better off fitting it as a 
fixed effect and not trying to decompose the variance into between labs and 
within labs etc. Anyway, just my 2 cents and hope I explained this 
correctly...

See also the wiki page set up by Ben Bolker:
http://glmm.wikidot.com/faq

e.g. you might be interested in this entry therein:

Zero or very small random effects variance estimates;
(...)
Very small variance estimates, or very large correlation estimates, often 
indicates unidentifiability/lack of data (either due to exact 
identifiability [e.g. designs that are not replicated at an important level] 
or weak identifiable (designs that would be workable with more data of the 
same type)

HTH

Cheers,

Luca

----- Original Message ----- 
From: "Jana Bürger" <jana.buerger at uni-rostock.de>
To: "Andrew Dolman" <andydolman at gmail.com>
Cc: <r-sig-mixed-models at r-project.org>
Sent: Monday, March 29, 2010 10:17 AM
Subject: Re: [R-sig-ME] unbalanced data in nested lmer model

> Dear Andrew and other list members,
> As I described in an earlier 
> post(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/003503.html)
> my data is actually hierarchical down to the level of fields within farms.
>
> There is more than 500 cases on 8 farms in 6 regions.
> Would you not think that gives enough power to distinguish within region 
> variability vs. between regions?
>
> Moreover I don't understand your argument that fitting random efects with 
> less than 5 levels was dodgy, as often examples in the books have 3 
> samples from one beach, or 3 laboratory workers within one laboratory. 
> These are less than 5 levels, are they not?
>
> Regards, Jana
>
> Andrew Dolman schrieb:
>> Dear Jana,
>>
>>  >An anova(lm1, lm2)  lm1<-lmer(y~x1+x2+...+(1|region)+(1|region:farm)); 
>> lm2<-lmer(y+x1+x2+...+(1|farm)) said models did not differ significantly 
>> and AIC was about the same. So I know there is no additional explanatory 
>> power including the region term.
>>
>>  >Yet, I would like to keep the region effect in the model to separate 
>> and compare the effect size of region vs. farm. Is it valid to do so even 
>> if  some of the regions are only represented by one farm?
>>
>> I don't think you have the data to ask questions about differences 
>> between regions as distinct from differences between farms. Look at it 
>> this way. If you were just doing a normal comparison between regions and 
>> you only looked at 1 or 2 farms per region, would you have the 
>> statistical power to say that differences were due to region rather than 
>> farm? Answer = No.
>>
>> Similarly, are the differences between the farms because they are in 
>> different regions or just normal variation between farms? Well you only 
>> have 2 farms per region so it's hard to tell. Maybe you just have enough 
>> data if pairs of farms within regions are always very similar and 
>> differences between regions large.
>>
>> Also. Fitting random effects with fewer than 5 levels is dodgy, and you 
>> only have 2 levels of farm per region, sometimes 1.
>>
>> Perhaps you could look at it this way.
>>
>> compare
>>
>> m1 <- lmer (y~(1|region))
>> m2 <- lmer (y~(1|farm))
>>
>> If m2 is better then there is variation between farms within regions, if 
>> there's no difference then region accounts for most of the variation. BUT 
>> you've not got much power to detect farm effects within regions, so a 
>> null result is not strong evidence for the absence of farm variation 
>> within regions.
>>
>>
>> Andy.
>>  andydolman at gmail.com <mailto:andydolman at gmail.com>
>>
>>
>>
>
> -- 
> Jana Bürger
>
> Universität Rostock
> Agrar-  und Umweltwissenschaftliche Fakultät
> FG Phytomedizin
> Satower Straße 48
> 18059 Rostock
>
> Tel. 0381-498 31 71
> Fax.0381-498 31 62
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>