[R] Understanding lm-based analysis of fractional factorial experiments

Wed Mar 6 15:17:38 CET 2013

Hi,

On Wed, Mar 6, 2013 at 5:46 AM, Kjetil Kjernsmo <kjekje at ifi.uio.no> wrote:
> All,
>
> I have just returned to R after a decade of absence, and it is good to see
> that R has become such a great success! I'm trying to bring Design of
> Experiments into some aspects of software performance evaluation, and to
> teach myself that, I picked up "Experiments: Planning, Analysis and
> Optimization" by Wu and Hamada. I try to reproduce an analysis in the book
> using lm, but have to conclude I don't understand what lm does in this
> context, even though I end up at the desired result. I'm currently using R
> 2.15.2 on a recent Fedora system, but I get the same result on Debian Wheezy
> and Debian Squeeze. I think the discussion below can be followed without
> having the book at hand though.

I have my doubts...

>
> I'm working with tables 5.2 and 5.5 in the above mentioned book. Table 5.2
> contains data from the "Leaf spring experiment". The dataset is also in this
> zip file:
>
> ftp://ftp.wiley.com/public/sci_tech_med/experiments-planning/data%20sets.zip
>
> I've learned from the book that the effects can be found using a linear
> model and double the coefficients. So, I do
>> leaf <-
>> read.table("/ifi/bifrost/a03/kjekje/fag/experimental-planning/book-datasets/LeafSpring
>> table 5.2.dat", col.names=c("B", "C", "D", "E", "Q", paste("r", 1:3,
>> sep=""), "yavg", "ssq", "lnssq"))
>> leaf.lm <- lm(yavg ~ B * C * D * E * Q, data=leaf)

That is complete nonsense:

> dim(leaf)
[1] 16 11
> length(coef(leaf.lm))
[1] 32

So you are trying to estimate 32 coefficients from 16 data points.
That is never going to work.

>> leaf.lm
>
> Call:
> lm(formula = yavg ~ B * C * D * E * Q, data = leaf)
>
> Coefficients:
>    (Intercept)              B+              C+              D+      E+
>        7.54000         0.07003         0.32333        -0.09668 0.07668
>             Q+           B+:C+           B+:D+           C+:D+   B+:E+
>       -0.33670         0.01335         0.11995         0.02335      NA
>          C+:E+           D+:E+           B+:Q+           C+:Q+   D+:Q+
>             NA              NA         0.22915        -0.25745 0.28255
>          E+:Q+        B+:C+:D+        B+:C+:E+        B+:D+:E+ C+:D+:E+
>        0.05415              NA              NA              NA      NA
>       B+:C+:Q+        B+:D+:Q+        C+:D+:Q+        B+:E+:Q+ C+:E+:Q+
>        0.04160        -0.16160        -0.18840              NA      NA
>       D+:E+:Q+     B+:C+:D+:E+     B+:C+:D+:Q+     B+:C+:E+:Q+ B+:D+:E+:Q+
>             NA              NA              NA              NA      NA
>    C+:D+:E+:Q+  B+:C+:D+:E+:Q+
>             NA              NA
>
> (seems there is little I can do about the line breaks here, sorry)
>
> However, the book (table 5.5), has 0.221 for the main effect of B and 0.176,
> and the above is neither this, nor half of it. Now, I can reproduce what's
> in the book with
>
>> lm(yavg ~ B, data=leaf)
>
> Call:
> lm(formula = yavg ~ B, data = leaf)
>
> Coefficients:
> (Intercept)           B+
>      7.5254       0.2213

>
>> lm(yavg ~ C, data=leaf)
>
> Call:
> lm(formula = yavg ~ C, data = leaf)
>
> Coefficients:
> (Intercept)           C+
>      7.5479       0.1763
>
> Assuming lm does in fact double the coefficient in this case,

I have no idea what this means.

 but here the
> intercept varies, which doesn't seem correct,

You mean that the intercept for

lm(yavg ~ B, data=leaf)

differs from the intercept for

lm(yavg ~ C, data=leaf)

? If so that is expected. The intercept is the expected value of yavg
when all predictors are zero. The expected value for B = zero does not
have to be the same as the expected value for C = 0.

 nor can I as trivially find
> the interactions the same way.

What way?

>
> Now, I try the effects() function, and get familiar numbers:
>> effects(leaf.lm)
> (Intercept)          B+          C+          D+          E+          Q+
>   -30.54415    -0.44250     0.35250    -0.05750    -0.20750    -0.51920
>       B+:C+       B+:D+       C+:D+       B+:Q+       C+:Q+       D+:Q+
>    -0.03415    -0.03915     0.07085    -0.16915     0.33085    -0.10755
>       E+:Q+    B+:C+:Q+    B+:D+:Q+    C+:D+:Q+
>     0.05415    -0.02080     0.08080    -0.09420
>
> and indeed, I have verified that effects(leaf.lm)/2 gives me the expected
> result.
>
> So, I have found the correct answer, but I don't understand why. I have read
> the documentation for effects() as well as looked through the relevant
> chapter in "Statistical Models in S", but from that all I got was that I
> suppose there is a hint in the phrase "the effects are the uncorrelated
> single-degree-of-freedom", and that is somewhat different from the
> coefficients, but I can't make out from the book (Wu & Hamada) why the
> coefficients should be any different than the effects, to the contrary, it
> is quite clear from equation (5.8) in the book that the coefficients they
> use are effects(leaf.lm)/4.
>
> So, there are at least two points of confusion here, one is how coef()
> differs from effects() in the case of fractional factorial experiments, and
> the other is the factor 1/4 between the coefficients used by Wu & Hamada and
> the values returned by effects() as I would think from theory I've read that
> it should be a factor 2.

I don't know the answers to these questions (I don't really understand
the effects function). So I'm afraid all I've done is add more
questions, i.e., why in the world are we estimating 32 coefficients
from 16 points?

Best,
Ista

>
> Best regards,
>
> Kjetil
> --
> Kjetil Kjernsmo
> PhD Research Fellow, University of Oslo, Norway
> Semantic Web / SPARQL Query Federation
> kjekje at ifi.uio.no http://www.kjetil.kjernsmo.net/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.