[R-sig-ME] [R] Balanced design, differences in results using anova and lmer/anova

Douglas Bates bates at stat.wisc.edu
Tue Mar 3 19:18:10 CET 2009


On Fri, Feb 27, 2009 at 3:06 AM, Lars Kunert <lkunert at mpi-sb.mpg.de> wrote:
> Hi, I am trying to do an analysis of variance for an unbalanced design.
> As a toy example, I use a dataset presented by K. Hinkelmann and O.
> Kempthorne in "Design and Anaylysis of Experiments" (p353-356).
> This example is very similar to my own dataset, with one difference: it
> is balanced.
> Thus it is possible to do an anaylsis using both: (1) anova, and (2) lmer.
> Furthermore, I can compare my results with the results presented in the
> book (the book uses SAS).

> In short:
>> using anova, I can reproduce the results presented in the book.
>> using lmer, I fail to reproduce the results
> However, for my "real" analysis, I need lmer - what do I do wrong?

> The example uses as randomized complete block desigh (RCBD) with a
> nested blocking structure
> and subsampling.

> response:
>  height (of some trees)
> covariates:
>  HSF (type of the trees)
> nested covariates:
>  loc (location)
>  block  (block is nested in location)
>
> # the data (file: pine.txt) looks like this:
>
> loc    block    HSF    height
> 1    1    1    210
> 1    1    1    221
> 1    1    2    252
> 1    1    2    260
> 1    1    3    197
> 1    1    3    190
> 1    2    1    222
> 1    2    1    214
> 1    2    2    265
> 1    2    2    271
> 1    2    3    201
> 1    2    3    210
> 1    3    1    220
> 1    3    1    225
> 1    3    2    271
> 1    3    2    277
> 1    3    3    205
> 1    3    3    204
> 1    4    1    224
> 1    4    1    231
> 1    4    2    270
> 1    4    2    283
> 1    4    3    211
> 1    4    3    216
> 2    1    1    178
> 2    1    1    175
> 2    1    2    191
> 2    1    2    193
> 2    1    3    182
> 2    1    3    179
> 2    2    1    180
> 2    2    1    184
> 2    2    2    198
> 2    2    2    201
> 2    2    3    183
> 2    2    3    190
> 2    3    1    189
> 2    3    1    183
> 2    3    2    200
> 2    3    2    195
> 2    3    3    197
> 2    3    3    205
> 2    4    1    184
> 2    4    1    192
> 2    4    2    197
> 2    4    2    204
> 2    4    3    192
> 2    4    3    190
>
> #
> # then I load the data
> #
> read.data = function()
> {
>        d = read.table( "pines.txt", header=TRUE )
>
>        d$loc       = as.factor( d$loc   )
>        d$block.tmp = as.factor( d$block )
>        d$block     = ( d$loc:d$block.tmp )[drop=TRUE]  # lme4 does not support
> implicit nesting
>
>        d$HSF   = as.factor( d$HSF )
>
>        return( d )
> }
>
> d = read.data()
>
>
> #
> # using anova.....
> #
> m.aov = aov( height ~ HSF*loc + Error(loc/block + HSF:loc/block), data=d )
> summary( m.aov )
>
> #
> # I get:
> #
> Error: loc
>    Df Sum Sq Mean Sq
> loc  1  20336   20336
>
> Error: loc:block
>          Df  Sum Sq Mean Sq F value Pr(>F)
> Residuals  6 1462.33  243.72
>
> Error: loc:HSF
>        Df  Sum Sq Mean Sq
> HSF      2 12170.7  6085.3
> HSF:loc  2  6511.2  3255.6
>
> Error: loc:block:HSF
>          Df  Sum Sq Mean Sq F value Pr(>F)
> Residuals 12 301.167  25.097
>
> Error: Within
>          Df Sum Sq Mean Sq F value Pr(>F)
> Residuals 24 529.00   22.04
>
> #
> # which is, what I expected, however, using lmer....
> #
> m.lmer = lmer( height ~ HSF*loc + HSF*(loc|block), data=d )
> anova( m.lmer )
>
> #
> # I get:
> #
> Analysis of Variance Table
>        Df  Sum Sq Mean Sq
> HSF      2 12170.7  6085.3
> loc      1  1924.6  1924.6
> HSF:loc  2  6511.2  3255.6
>
> #
> # what is, at least not what I expected...
> #
> Thanks for your help, Lars

Before I begin fitting models I try to plot the data, as shown in the
enclosed files.  What I see there is a strong effect for location, a
strong effect for HSF and an HSF:loc interaction.  The effect for
block within location is marginal compared to the other effects.  You
didn't say, but I suspect that these are constructed data because the
levels of block seem to be systematically related to the height and I
would not expect that if the block was randomly chosen within
location.

As you and Rolf have both discovered, the block factor is implicitly
nested within loc so any random effects should be defined with respect
to loc:block instead of block.  An alternative is to label the blocks
so that different blocks get different labels, which seems eminently
sensible to me but I appreciate that the implicitly nested labels are
frequently used in texts and you may want to preserve the original
form of the data.

As for fitting models, I would probably not use random effects for loc
in these data because there are only two levels of loc.  When you use
random effects you are estimating variances not means and estimating a
variance from only two groups is precarious at best.

We may be able to be more helpful if you described the structure of your data.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pine.pdf
Type: application/pdf
Size: 8801 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20090303/294d4aec/attachment.pdf>


More information about the R-sig-mixed-models mailing list