[R-sig-ME] Should blocking factors be modeled as random effects?
Prew, Paul
Paul.Prew at ecolab.com
Thu Feb 5 02:20:37 CET 2009
Thank to everyone who replied to my initial query. I've appreciated the
different angles presented for how random effects fit with designed
experiments.
------------------------------
Message: 2
Date: Sat, 31 Jan 2009 10:20:29 -0600
From: Douglas Bates <bates at stat.wisc.edu>
Subject: Re: [R-sig-ME] Should blocking factors be modeled as random
effects?
To: "Prew, Paul" <Paul.Prew at ecolab.com>
Cc: r-sig-mixed-models at r-project.org
Message-ID:
<40e66e0b0901310820n737f27d0nafbbf3b7e299634e at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Mon, Jan 26, 2009 at 2:42 PM, Prew, Paul <Paul.Prew at ecolab.com>
wrote:
> I have been following your R discussion list on mixed modeling for a
> few weeks, in hopes of understanding mixed modeling better. And it
> has helped. I was not aware of the controversy surrounding degrees of
> freedom and the distribution of test statistics. I have just been
> trusting the ANOVA output from software (Minitab, JMP) that reported F
> tests. JMP uses Kenward-Roger, Minitab's ANOVA reports an
> F-statistic, followed by "F-test not exact for this term".
>
> A recent mention by Douglas Bates of George Box, though, hit upon an
> aspect of mixed models that has confused me. I'm an industrial
> statistician, and studied statistics at Iowa State and the University
> of Minnesota. I have had 3 courses in DOE, 2 at the graduate level,
> and none of them mentioned blocking factors could (should?) be modeled
> as random effects. **Exception: the whole plots in a split plot
> design were taught as random effects.**
>
> The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
> Hamada (2000). Both texts model blocking factors such as Days and
> Batches as fixed effects. Montgomery's DOE text, 2009 rev., pretty
> consistently states that blocks can be either random or fixed. Don't
> have a consensus from that small sample.
>
> I'm trying to understand the implications if I consistently used
> random effects for DOE analysis.
>
> I'm quite willing to use R for mixed models, seeing as Minitab, JMP
etc.
> appear to use degrees of freedom calculations that are questionable.
> But as Douglas points out --- Box said, "all models are wrong, some
> are useful" => Box's latest text doesn't bother with random effects
> for DOE => does it follow that for practical purposes it's OK to
> consider blocks as fixed? There are certainly several advantages to
> keeping it simple (i.e. fixed only):
> * The analyses we (my statistics group) provide to our chemists and
> engineers are more easily understood
> * The 2-day short courses we teach in DOE to these same coworkers
> couldn't realistically get across the idea of mixed model analysis ---
> they would become less self-sufficient, where we're trying to make
> them more self-sufficient
> * We have a handful of softwares (Minitab, JMP, Design Expert) that
> can perform DOE and augment the results in a number of ways:
> *** fold-over the design to resolve aliasing in fractional designs
> *** add axial runs to enable Response Surface methods
> *** add distributions to the input factors, enabling
> Robustness/Sensitivity analyses
> *** running optimization algorithms to suggest the factor settings
> that simultaneously consider multiple objectives
> ***** Not to mention the loss of Sample Size Calculations, far and
> away my most frequent request None of these softwares recognize random
> factors to perform these augmentations
>
> Replacing this functionality with R is going to be a high learning
> curve, and probably not entirely possible. My coding skills in R
> consist of cutting and pasting what others have done.
>
> I don't really expect that there's a "right" answer to the question of
> random effects in DOE. But I do believe that beyond the loss of
> p-values, there are other ramifications for advising experimenters,
> '"You can't trust results from your blocking on Days (or Shifts or RM
> Lots or Batches, etc) unless they are modeled as random effects."
>
> There's statistical significance, and practical significance. My hope
> is that blocks while random effects are statistically "truer", their
> marginal worth over fixed effects in DOE is ignorable. Again, I don't
> want this to come across as shooting the messenger, you are only
> laying out the current state of art and the work that remains to be
> done. But any insight you can provide into what's practical right now
> would be highly interesting.
Thanks for bringing up the topic, Paul. As you and I know, you
originally sent your question to me and I encouraged you to send it to
this list.
As I wrote in my initial response to you, "My off-the-cuff reaction is
that in these situations the effects of blocking factors are regarded as
nuisance parameters whereas in many mixed-model situations the variances
and sometimes the values of the random effects are themselves of
interest. When the effects are nuisance parameters the simplest
approach is to model them as fixed effects."
On thinking about it more, I can imagine several different approaches to
this question. If you just ask, "Are the levels of this blocking factor
a fixed set of levels or a random selection from a population of
possible levels?" then in most cases I imagine you would say they are a
random selection and should be modeled using random effects.
This would especially be true of what Taguchi called "environmental
factors" which, by definition, are not under the control of the
experimenter.
If you say that blocking factors are not of interest per se and that
your purpose is simply to control for them, it is simpler to model them
as fixed effects. There are two aspects to "simpler":
computationally simpler and conceptually simpler. Of these I think that
conceptually is more important. The computational burden for fitting a
mixed model versus a fixed-effects model is really a software problem,
not a hardware problem. Commercial statistical software like Minitab or
JMP with a simple, convenient interface has limited flexibility, in part
because it is designed to have a simple, convenient interface - the
"what you see is all you get" problem. (I googled that phrase and got a
laugh from the article at www.computer-dictionary-online.org which
referred to "point-and-drool
interfaces".) The actual calculations involved in fitting mixed models
are not that formidable but designing the interface can be.
(One of the underappreciated aspects of the model-fitting software in R,
and in the S language in general, is the structure of the model.frame,
model.matrix sequence for transforming a formula into a numerical
representation. This makes designing an interface much.
much easier as long as you count on the user to input a formula.)
Conceptually fixed-effects models are simpler than mixed models but they
may over-simplify the analysis. If your purpose is estimation of
fixed-effects parameters, including assessing precision of the
estimates, then you need to ask if you want to estimate those parameters
conditional the particular levels of the blocking factor that you
observed or with respect to the possible values of the blocking factors
that are represented by the sample you observed. If you are willing to
condition on the particular levels you observed then use fixed-effects
for the blocking factor. For all possible levels of the blocking factor
you could use random effects. For a designed experiment the estimates
of the fixed effects will probably not be affected much by using random
effects for the blocking factor instead of fixed effects. However the
precision of the estimates may be different. Perhaps more importantly,
the precision of predictions of future responses would be different.
I'm not even sure how one would even formulate such a prediction from a
model with fixed effects for the blocking factor if the factor was
something like "batch" and the batches from the experiment were already
used up.
Having said that the precision of the estimates of the fixed-effects
parameters would be different if random effects are used for the
blocking factor I should admit that this is exactly the problem to which
I don't have a good general solution.
It appears that the question of fixed or random for a blocking factor is
like many others in statistics - the choice of the model depends on what
you want to do with it.
CONFIDENTIALITY NOTICE: =\ \ This e-mail communication a...{{dropped:12}}
More information about the R-sig-mixed-models
mailing list