[R-sig-ME] unbalanced data in nested lmer model
Luca Borger
lborger at uoguelph.ca
Mon Mar 29 19:55:01 CEST 2010
Hello,
as Andrew and others already explained:
> There is more than 500 cases
Fine, this might give you reasonable estimates about how your y is affected
by your fixed effects covariates (x1,x2,...)
> (...) on 8 farms in 6 regions.
#and, from your previos post
>For 2 of 8 regions there is only 1 farm, the other regions have 2 farms.
thus no way to estimate a difference between region or farm effects for 2
regions, and very, very limited power for the other 6 (just 2 farms per
region). To make things worse your data are also quite unbalanced:
>unbalance of case numbers in cells? Or would it be no problem if cell sizes
>vary between 0 and 53?
which I think means for some farms you got only one record? Anyway, to
recap, probably OK data for understanding y~x1+x2 etc., insufficient data
otherwise (should invest in getting data for more farms within regions, not
more data for the farms you have already sampled).
> Moreover I don't understand your argument that fitting random efects with
> less than 5 levels was dodgy, as often examples in the books have 3
> samples from one beach, or 3 laboratory workers within one laboratory.
> These are less than 5 levels, are they not?
These are usually toy datasets to exemplify how the approach works, I do not
think they make a claim that the resulting variance estimates are very
reliable (think in the Zuur etal. mixed effects book you can find more
realistic examples, if I remember well). Plus, "level" refers to the number
of beaches or the number of labs etc. and the resulting variance estimates -
if less than say 5 it appears that you might be better off fitting it as a
fixed effect and not trying to decompose the variance into between labs and
within labs etc. Anyway, just my 2 cents and hope I explained this
correctly...
See also the wiki page set up by Ben Bolker:
http://glmm.wikidot.com/faq
e.g. you might be interested in this entry therein:
Zero or very small random effects variance estimates;
(...)
Very small variance estimates, or very large correlation estimates, often
indicates unidentifiability/lack of data (either due to exact
identifiability [e.g. designs that are not replicated at an important level]
or weak identifiable (designs that would be workable with more data of the
same type)
HTH
Cheers,
Luca
----- Original Message -----
From: "Jana Bürger" <jana.buerger at uni-rostock.de>
To: "Andrew Dolman" <andydolman at gmail.com>
Cc: <r-sig-mixed-models at r-project.org>
Sent: Monday, March 29, 2010 10:17 AM
Subject: Re: [R-sig-ME] unbalanced data in nested lmer model
> Dear Andrew and other list members,
> As I described in an earlier
> post(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/003503.html)
> my data is actually hierarchical down to the level of fields within farms.
>
> There is more than 500 cases on 8 farms in 6 regions.
> Would you not think that gives enough power to distinguish within region
> variability vs. between regions?
>
> Moreover I don't understand your argument that fitting random efects with
> less than 5 levels was dodgy, as often examples in the books have 3
> samples from one beach, or 3 laboratory workers within one laboratory.
> These are less than 5 levels, are they not?
>
> Regards, Jana
>
> Andrew Dolman schrieb:
>> Dear Jana,
>>
>> >An anova(lm1, lm2) lm1<-lmer(y~x1+x2+...+(1|region)+(1|region:farm));
>> lm2<-lmer(y+x1+x2+...+(1|farm)) said models did not differ significantly
>> and AIC was about the same. So I know there is no additional explanatory
>> power including the region term.
>>
>> >Yet, I would like to keep the region effect in the model to separate
>> and compare the effect size of region vs. farm. Is it valid to do so even
>> if some of the regions are only represented by one farm?
>>
>> I don't think you have the data to ask questions about differences
>> between regions as distinct from differences between farms. Look at it
>> this way. If you were just doing a normal comparison between regions and
>> you only looked at 1 or 2 farms per region, would you have the
>> statistical power to say that differences were due to region rather than
>> farm? Answer = No.
>>
>> Similarly, are the differences between the farms because they are in
>> different regions or just normal variation between farms? Well you only
>> have 2 farms per region so it's hard to tell. Maybe you just have enough
>> data if pairs of farms within regions are always very similar and
>> differences between regions large.
>>
>> Also. Fitting random effects with fewer than 5 levels is dodgy, and you
>> only have 2 levels of farm per region, sometimes 1.
>>
>> Perhaps you could look at it this way.
>>
>> compare
>>
>> m1 <- lmer (y~(1|region))
>> m2 <- lmer (y~(1|farm))
>>
>> If m2 is better then there is variation between farms within regions, if
>> there's no difference then region accounts for most of the variation. BUT
>> you've not got much power to detect farm effects within regions, so a
>> null result is not strong evidence for the absence of farm variation
>> within regions.
>>
>>
>> Andy.
>> andydolman at gmail.com <mailto:andydolman at gmail.com>
>>
>>
>>
>
> --
> Jana Bürger
>
> Universität Rostock
> Agrar- und Umweltwissenschaftliche Fakultät
> FG Phytomedizin
> Satower Straße 48
> 18059 Rostock
>
> Tel. 0381-498 31 71
> Fax.0381-498 31 62
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list