[R-sig-ME] Should blocking factors be modeled as random effects?

Sat Jan 31 17:20:29 CET 2009

On Mon, Jan 26, 2009 at 2:42 PM, Prew, Paul <Paul.Prew at ecolab.com> wrote:
> I have been following your R discussion list on mixed modeling for a few
> weeks, in hopes of understanding mixed modeling better.  And it has
> helped.  I was not aware of the controversy surrounding degrees of
> freedom and the distribution of test statistics.  I have just been
> trusting the ANOVA output from software (Minitab, JMP) that reported F
> tests.  JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic,
> followed by "F-test not exact for this term".
>
> A recent mention by Douglas Bates of George Box, though, hit upon an
> aspect of mixed models that has confused me.  I'm an industrial
> statistician, and studied statistics at Iowa State and the University of
> Minnesota.  I have had 3 courses in DOE, 2 at the graduate level, and
> none of them mentioned blocking factors could (should?) be modeled as
> random effects.  **Exception: the whole plots in a split plot design
> were taught as random effects.**
>
> The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
> Hamada (2000).  Both texts model blocking factors such as Days and
> Batches as fixed effects.  Montgomery's DOE text, 2009 rev., pretty
> consistently states that blocks can be either random or fixed.  Don't
> have a consensus from that small sample.
>
> I'm trying to understand the implications if I consistently used random
> effects for DOE analysis.
>
> I'm quite willing to use R for mixed models, seeing as Minitab, JMP etc.
> appear to use degrees of freedom calculations that are questionable.
> But as Douglas points out --- Box said, "all models are wrong, some are
> useful" => Box's latest text doesn't bother with random effects for DOE
> =>  does it follow that for practical purposes it's OK to consider
> blocks as fixed?  There are certainly several advantages to keeping it
> simple (i.e. fixed only):
> * The analyses we (my statistics group) provide to our chemists and
> engineers are more easily understood
> * The 2-day short courses we teach in DOE to these same coworkers
> couldn't realistically get across the idea of mixed model analysis ---
> they would become less self-sufficient, where we're trying to make them
> more self-sufficient
> * We have a handful of softwares (Minitab, JMP, Design Expert) that can
> perform DOE and augment the results in a number of ways:
>   *** fold-over the design to resolve aliasing in fractional designs
>   *** add axial runs to enable Response Surface methods
>   *** add distributions to the input factors, enabling
> Robustness/Sensitivity analyses
>   *** running optimization algorithms to suggest the factor settings
> that simultaneously consider multiple objectives
>  *****  Not to mention the loss of Sample Size Calculations, far and
> away my most frequent request
> None of these softwares recognize random factors to perform these
> augmentations
>
> Replacing this functionality with R is going to be a high learning
> curve, and probably not entirely possible.  My coding skills in R
> consist of cutting and pasting what others have done.
>
> I don't really expect that there's a "right" answer to the question of
> random effects in DOE.  But I do believe that beyond the loss of
> p-values, there are other ramifications for advising experimenters,
> '"You can't trust results from your blocking on Days (or Shifts or RM
> Lots or Batches, etc) unless they are modeled as random effects."
>
> There's statistical significance, and practical significance.  My hope
> is that blocks while random effects are statistically "truer", their
> marginal worth over fixed effects in DOE is ignorable. Again, I don't
> want this to come across as shooting the messenger, you are only laying
> out the current state of art and the work that remains to be done.  But
> any insight you can provide into what's practical right now would be
> highly interesting.

Thanks for bringing up the topic, Paul.  As you and I know, you
originally sent your question to me and I encouraged you to send it to
this list.

As I wrote in my initial response to you,  "My off-the-cuff reaction
is that in these situations the effects of blocking factors are
regarded as nuisance parameters whereas in many mixed-model situations
the variances and sometimes the values of the random effects are
themselves of interest.  When the effects are
nuisance parameters the simplest approach is to model them as fixed effects."

On thinking about it more, I can imagine several different approaches
to this question.  If you just ask, "Are the levels of this blocking
factor a fixed set of levels or a random selection from a population
of possible levels?" then in most cases I imagine you would say they
are a random selection and should be modeled using random effects.
This would especially be true of what Taguchi called "environmental
factors" which, by definition, are not under the control of the
experimenter.

If you say that blocking factors are not of interest per se and that
your purpose is simply to control for them, it is simpler to model
them as fixed effects.  There are two aspects to "simpler":
computationally simpler and conceptually simpler.  Of these I think
that conceptually is more important.  The computational burden for
fitting a mixed model versus a fixed-effects model is really a
software problem, not a hardware problem.  Commercial statistical
software like Minitab or JMP with a simple, convenient interface has
limited flexibility, in part because it is designed to have a simple,
convenient interface - the "what you see is all you get" problem.  (I
googled that phrase and got a laugh from the article at
www.computer-dictionary-online.org which referred to "point-and-drool
interfaces".)  The actual calculations involved in fitting mixed
models are not that formidable but designing the interface can be.
(One of the underappreciated aspects of the model-fitting software in
R, and in the S language in general, is the structure of the
model.frame, model.matrix sequence for transforming a formula into a
numerical representation.  This makes designing an interface much.
much easier as long as you count on the user to input a formula.)

Conceptually fixed-effects models are simpler than mixed models but
they may over-simplify the analysis.  If your purpose is estimation of
fixed-effects parameters, including assessing precision of the
estimates, then you need to ask if you want to estimate those
parameters conditional the particular levels of the blocking factor
that you observed or with respect to the possible values of the
blocking factors that are represented by the sample you observed.  If
you are willing to condition on the particular levels you observed
then use fixed-effects for the blocking factor.  For all possible
levels of the blocking factor you could use random effects.  For a
designed experiment the estimates of the fixed effects will probably
not be affected much by using random effects for the blocking factor
instead of fixed effects.  However the precision of the estimates may
be different.  Perhaps more importantly, the precision of predictions
of future responses would be different.  I'm not even sure how one
would even formulate such a prediction from a model with fixed effects
for the blocking factor if the factor was something like "batch" and
the batches from the experiment were already used up.

Having said that the precision of the estimates of the fixed-effects
parameters would be different if random effects are used for the
blocking factor I should admit that this is exactly the problem to
which I don't have a good general solution.

It appears that the question of fixed or random for a blocking factor
is like many others in statistics - the choice of the model depends on
what you want to do with it.