[R-sig-ME] Unbalanced nested design
Douglas Bates
bates at stat.wisc.edu
Wed Apr 22 14:50:36 CEST 2009
On Tue, Apr 21, 2009 at 8:32 AM, Stephen Cole <swbcole at gmail.com> wrote:
> Hello All - I would like to run a 2 factor nested ANOVA. The design
> is unbalanced as i have 6 sites in 3 regions and 3 sites in 1 other
> region. Site is nested in region I am interested in the differences
> in mean recruit density among 4 regions. I have used the lme function
> in the nlme library and am confused about the output. I have a copy
> of P/B and I quote p. 25 " The lme function does produce sensible
> maximum likelihood estimates or restricted maximum likelihood
> estimates from the unbalanced data." Thus, as i understand it lme can
> handle this unbalanced data set that i have. However, when i compared
> the lme model to an aov model, the fixed effects results are
> identical. (f-ratio and p-value). How does lme handle unbalanced data
> if the result is the same as aov which can not handle unbalanced data.
> Thank-you for any help provided.
Thank you for including a description of your data and the model that
you wish to fit.
The lme function and the newer lmer function in the lme4 package both
fit linear mixed-effects models by optimizing the log-likelihood (for
maximum likelihood, ML, estimates) or the REML criterion, which is a
related to the log-likelihood.
I can't really remember all the details of the aov fit. I think it
goes something like obtaining a least squares fit to the fixed-effects
and the random effects combined then performing a further
decomposition of the residual sum of squares into the various error
strata and performing F tests based on these various components. If
the design is completely balanced then the error strata are
orthogonal. I'm not really sure what happens for unbalanced designs.
The REML criterion, which is the default estimation criterion for lme,
is designed to "correct" the maximum likelihood estimates so they
more closely resemble the traditional results from the error strata
approach in the cases where error strata can be applied. I don't
think that the fact that they match up in this case should be
alarming. That is sort of what is supposed to happen.
> I have attached a subsection of my data. The total number of records
> is 420, with a sample of 20 quadrat counts from each site (21 sites x
> 20 = 420)
>
> Data:
> adults recruits region site site2 site3
> 1 138 1268.3300 ANS 1 1 ANS:1
> 2 131 608.3300 ANS 1 1 ANS:1
> 3 13 696.8800 ANS 1 1 ANS:1
> 4 12 412.5000 ANS 1 1 ANS:1
> 5 2 355.5600 ANS 1 1 ANS:1
> 6 0 528.0000 ANS 1 1 ANS:1
> 7 4 421.2100 ANS 1 1 ANS:1
> 8 0 378.0000 ANS 1 1 ANS:1
> 9 92 893.3300 ANS 1 1 ANS:1
> 10 77 1184.3100 ANS 1 1 ANS:1
> 11 92 961.4200 ANS 1 1 ANS:1
> 12 0 1029.0000 ANS 1 1 ANS:1
> 13 19 1144.6800 ANS 1 1 ANS:1
>
> Region (4 levels, fixed)
> Site (6 levels and 3 levels, random)
>
> data$site <- as.factor(data$site)
> data$site3 <-factor(data$region:data$site)
>
>
> mod.lme <- lme(recruits ~ region, data=data, random=~1|site3)
>
> anova(mod)
>
> numDF denDF F-value p-value
> (Intercept) 1 399 93.58730 <.0001
> region 3 17 19.21751 <.0001
>
> mod.aov <- aov(recruits ~ region + Error(site3), data=data)
> summary(mod.aov)\
>
> Error: site3
> Df Sum Sq Mean Sq F value Pr(>F)
> region 3 32024226 10674742 19.218 1.057e-05 ***
> Residuals 17 9442984 555470
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Error: Within
> Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 399 11839881 29674
>
> Now, both F-ratios are 19.21, I am not sure what i am doing
> incorrectly but I would appreciate any advice on my mistake. Thanks
> very much
>
> Stephen Cole
> Graduate student
> Marine Ecology Lab
> Saint Francis Xavier University
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list