[R] How to interpret an ANOVA result?

Mon May 14 12:17:03 CEST 2012

Hello all,

here's a real-world example: I'm measuring a quantity (d) at five
sites (site1 thru site5) on a silicon wafer. There is a clear
site-dependence of the measured value. To find out if this is a
measurement artifact I measured the wafer four times: twice in the
normal position (posN), and twice rotated by 180 degrees (posR). My
data looks like this (full, self-contained code at bottom). Note that
sites with the same number correspond to the same physical location on
the wafer (the rotation has already been taken into account here).

> head(x)
     d site pos
1 1383    1   N
2 1377    1   R
3 1388    1   R
4 1373    1   N
5 1386    2   N
6 1394    2   R

> boxplot (d~pos+site)

This boxplot (see code) already hints at a true site-dependence of the
measured value (no artifact). OK, so let's do an ANOVA to make this
more quantitative:

> summary(lm(d ~ site*pos)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 1378.000      3.078 447.672  < 2e-16 ***
site2         11.500      4.353   2.642  0.02466 *
site3         12.000      4.353   2.757  0.02025 *
site4         17.000      4.353   3.905  0.00294 **
site5          1.000      4.353   0.230  0.82294
posR           4.500      4.353   1.034  0.32561
site2:posR    -4.000      6.156  -0.650  0.53050
site3:posR   -10.500      6.156  -1.706  0.11890
site4:posR    -5.500      6.156  -0.893  0.39264
site5:posR    -3.000      6.156  -0.487  0.63655

Now I think that I see the following:
- The average of d at site1 in pos. N (first in alphabet) is 1378.
- Average values for site2, 3, 4 (especially 4) in pos. N deviate
significantly from pos. 1. For instance, values at site4 are on
average 17 greater than at site1.
- The average value at site5 does not differ significantly from site1.
OK, that was the top part of the result table. Now the bottom part:
- In reverse position(posR) the average of d at site1 is 4.5 bigger,
but that's not significant.
- The average of d at site3:posR is 10.5 smaller than something, but
smaller than what? And why does this -10.5 deviation have a p-value of
.1 (not significant) vs the .02 (significant) deviation of 11.5
(site2, top part)?

Let's see if I can figure that out. Difference between posN and posR
at site3 is not so big:
> mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])
[1] -6
Is this what makes it insignificant?

Shuffling around the numbers until I get to -10.5:

> mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])-(mean(d[site==1&pos=="R"])-mean(d[site==1&pos=="N"]))
[1] -10.5

OK, one has to keep track of all the differences and stuff.

So I think I have understood about 80% of this simple example. The
reason I'm going after this so stubbornly is that I'm at the beginning
of a DOE which will take several weeks of measuring and will end up
being analyzed with a big ANOVA (two response and about six
explanatory variables, some continuous, some factorial). Already in
the DOE phase I want to understand what I will be doing with the data
later (this is for a Six Sigma project in an industrial production
environment, in case anybody wants to know).

Thanks,
robert

Here's the full dataset:

x <- structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L,
1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L,
1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
    pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
    2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("N",
    "R"), class = "factor")), .Names = c("d", "site", "pos"), row.names = c(NA,
-20L), class = "data.frame")
attach(x)
head(x)
boxplot (d~pos+site)