[R-sig-eco] Should blocks be considered fixed or random?

Sat Apr 17 15:48:43 CEST 2010

To follow on the discussion of Jens Oldeland's question:

The points made by Thierry and Ben are very relevant.  One of the 
suggestions was to treat blocks (banks in Jens's data) are a fixed 
effect instead of a random effect.  I've made that change in other 
analyses and get very different results from the two ways of treating 
blocks.  That surprised me because in a traditional designed experiment, 
in which each treatment occurs once in each block, the treatment of 
blocks (F or R) has no impact on conclusions about treatment effects.

You get different results only when the blocks have different covariate 
values.  In that situation, there are three possible ways to define the 
regression slope:
1) the within-block slope, i.e. the relationship between X and Y within 
each bank.  This is then pooled over the 4 blocks.
2) the between-block slope, i.e. the relationship between the average X 
and the average Y, where each average is over samples within a block. In 
Jens's study, this regression is based on 4 observations.
3) the 'ignore-block' slope, i.e. remove block from the model.  This 
estimate combines the within and between block information.

I believe these are three conceptually different ways to define the 
relationship between X and Y; they are not three different ways to 
estimate the same parameter.  There are lots of ecological reasons why 
the within and between block slopes can be different quantities.  Hence, 
the choice of estimate should be based on your decision which slope is 
the most relevant to your question.

If you think of blocks as fixed, you determine which quantity you are 
estimating because you choose the terms in the model.  If blocks are in 
the model, you get parameter (1), the within-block slope.  If blocks are 
left out, you get parameter (3) and if you fit the regression to block 
averages, you get parameter (2).

If you treat blocks as random, the data chooses the parameter for you! 
This is because of the general phenomenon of recovery of inter-block 
information when blocks are treated as a random effect. If the estimated 
block variance is zero, the regression coefficient estimates the 
ignore-blocks parameter (3).  If the estimated block variance is large, 
relative to the error variance, the coefficient estimates the 
within-blocks parameter (1).  If block variance is intermediate, you are 
estimating something in between the two.

I believe this argues for never treating blocks as a random effect when 
your goal is to estimate regression parameters.

The one possible exception is if you are fitting a random coefficients 
regression in which both intercept and slope vary between blocks.  I 
don't know whether similar issues arise in this model.

Philip Dixon