[R-sig-eco] Should blocks be considered fixed or random?
Philip Dixon
pdixon at iastate.edu
Sat Apr 17 15:48:43 CEST 2010
To follow on the discussion of Jens Oldeland's question:
The points made by Thierry and Ben are very relevant. One of the
suggestions was to treat blocks (banks in Jens's data) are a fixed
effect instead of a random effect. I've made that change in other
analyses and get very different results from the two ways of treating
blocks. That surprised me because in a traditional designed experiment,
in which each treatment occurs once in each block, the treatment of
blocks (F or R) has no impact on conclusions about treatment effects.
You get different results only when the blocks have different covariate
values. In that situation, there are three possible ways to define the
regression slope:
1) the within-block slope, i.e. the relationship between X and Y within
each bank. This is then pooled over the 4 blocks.
2) the between-block slope, i.e. the relationship between the average X
and the average Y, where each average is over samples within a block. In
Jens's study, this regression is based on 4 observations.
3) the 'ignore-block' slope, i.e. remove block from the model. This
estimate combines the within and between block information.
I believe these are three conceptually different ways to define the
relationship between X and Y; they are not three different ways to
estimate the same parameter. There are lots of ecological reasons why
the within and between block slopes can be different quantities. Hence,
the choice of estimate should be based on your decision which slope is
the most relevant to your question.
If you think of blocks as fixed, you determine which quantity you are
estimating because you choose the terms in the model. If blocks are in
the model, you get parameter (1), the within-block slope. If blocks are
left out, you get parameter (3) and if you fit the regression to block
averages, you get parameter (2).
If you treat blocks as random, the data chooses the parameter for you!
This is because of the general phenomenon of recovery of inter-block
information when blocks are treated as a random effect. If the estimated
block variance is zero, the regression coefficient estimates the
ignore-blocks parameter (3). If the estimated block variance is large,
relative to the error variance, the coefficient estimates the
within-blocks parameter (1). If block variance is intermediate, you are
estimating something in between the two.
I believe this argues for never treating blocks as a random effect when
your goal is to estimate regression parameters.
The one possible exception is if you are fitting a random coefficients
regression in which both intercept and slope vary between blocks. I
don't know whether similar issues arise in this model.
Philip Dixon
More information about the R-sig-ecology
mailing list