[R] How to interpret an ANOVA result?
Robert Latest
boblatest at gmail.com
Mon May 14 12:17:03 CEST 2012
Hello all,
here's a real-world example: I'm measuring a quantity (d) at five
sites (site1 thru site5) on a silicon wafer. There is a clear
site-dependence of the measured value. To find out if this is a
measurement artifact I measured the wafer four times: twice in the
normal position (posN), and twice rotated by 180 degrees (posR). My
data looks like this (full, self-contained code at bottom). Note that
sites with the same number correspond to the same physical location on
the wafer (the rotation has already been taken into account here).
> head(x)
d site pos
1 1383 1 N
2 1377 1 R
3 1388 1 R
4 1373 1 N
5 1386 2 N
6 1394 2 R
> boxplot (d~pos+site)
This boxplot (see code) already hints at a true site-dependence of the
measured value (no artifact). OK, so let's do an ANOVA to make this
more quantitative:
> summary(lm(d ~ site*pos)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1378.000 3.078 447.672 < 2e-16 ***
site2 11.500 4.353 2.642 0.02466 *
site3 12.000 4.353 2.757 0.02025 *
site4 17.000 4.353 3.905 0.00294 **
site5 1.000 4.353 0.230 0.82294
posR 4.500 4.353 1.034 0.32561
site2:posR -4.000 6.156 -0.650 0.53050
site3:posR -10.500 6.156 -1.706 0.11890
site4:posR -5.500 6.156 -0.893 0.39264
site5:posR -3.000 6.156 -0.487 0.63655
Now I think that I see the following:
- The average of d at site1 in pos. N (first in alphabet) is 1378.
- Average values for site2, 3, 4 (especially 4) in pos. N deviate
significantly from pos. 1. For instance, values at site4 are on
average 17 greater than at site1.
- The average value at site5 does not differ significantly from site1.
OK, that was the top part of the result table. Now the bottom part:
- In reverse position(posR) the average of d at site1 is 4.5 bigger,
but that's not significant.
- The average of d at site3:posR is 10.5 smaller than something, but
smaller than what? And why does this -10.5 deviation have a p-value of
.1 (not significant) vs the .02 (significant) deviation of 11.5
(site2, top part)?
Let's see if I can figure that out. Difference between posN and posR
at site3 is not so big:
> mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])
[1] -6
Is this what makes it insignificant?
Shuffling around the numbers until I get to -10.5:
> mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])-(mean(d[site==1&pos=="R"])-mean(d[site==1&pos=="N"]))
[1] -10.5
OK, one has to keep track of all the differences and stuff.
So I think I have understood about 80% of this simple example. The
reason I'm going after this so stubbornly is that I'm at the beginning
of a DOE which will take several weeks of measuring and will end up
being analyzed with a big ANOVA (two response and about six
explanatory variables, some continuous, some factorial). Already in
the DOE phase I want to understand what I will be doing with the data
later (this is for a Six Sigma project in an industrial production
environment, in case anybody wants to know).
Thanks,
robert
Here's the full dataset:
x <- structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L,
1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L,
1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("N",
"R"), class = "factor")), .Names = c("d", "site", "pos"), row.names = c(NA,
-20L), class = "data.frame")
attach(x)
head(x)
boxplot (d~pos+site)
More information about the R-help
mailing list