Question hour: Th. 15.01.2015 from 2 to 3pm in HG G26.3
Exam date (no guarantee): Sat. 31.01.2015 from 9 to 11am (Höngg)
Exam review: We. 25.02.2015 from 12 to 1pm
Remark: please fill in the TA evaluation form and hand it in at the end!
15th of December 2014
Question hour: Th. 15.01.2015 from 2 to 3pm in HG G26.3
Exam date (no guarantee): Sat. 31.01.2015 from 9 to 11am (Höngg)
Exam review: We. 25.02.2015 from 12 to 1pm
Remark: please fill in the TA evaluation form and hand it in at the end!
As usual, not corrected by myself… just ask me if you have questions!
Goal: deal with nuisance factor (reduce extra variability and avoid confounding)
Origin: agriculture
Examples: batches, subjects, hospital, etc.
Key element: \(SS_{tot} = SS_{treat} + SS_{block} + SS_{res}\)
What happens if you don't block?
What if you know, but can't control a nuisance factor (ex: people choose their hospital, you forgot to control, etc.)?
Analysis of Covariance. Why is this different? What is preferable and why?
What about nuisance factor that you don't know and can't control for? randomization
How to interpret p-values of block factor?
Model with fixed vs. random effects.
Ideally, every treatment is tested in every block. The experimental units are assigned at random to each block, and each treatment is tested in each block:
Sometimes it is impossible to have each treatment in each block (e.g. wine tasting with 100 wines and 20 people)
How would you solve that?
Typically, you are given the number of treatments \(n\) to test and the size of a block \(k\). These are constraints of the experiment.
For example: \(k\) is 4 wheels on a car, maximum number of wines to taste, etc.
Rule: any two treatments occur together the same number of times: \(\lambda\).
Goal: find such a design, while minimizing the number of blocks needed…
Solution: play with the equations on p.86 of the lecture notes
Sh <-read.table("http://stat.ethz.ch/Teaching/Datasets/Shunt.txt",header=TRUE) Sh$Subject <- as.factor(Sh$Subject) Sh$Treatment <- as.factor(Sh$Treatment) Sh$Time <- as.factor(Sh$Time) head(Sh, 10)
## Subject Treatment Time Y ## 1 1 Selective Pre 51 ## 2 1 Selective Post 48 ## 3 2 Selective Pre 35 ## 4 2 Selective Post 55 ## 5 3 Selective Pre 66 ## 6 3 Selective Post 60 ## 7 4 Selective Pre 40 ## 8 4 Selective Post 35 ## 9 5 Selective Pre 39 ## 10 5 Selective Post 36
How to recognize a split-plot design?
More info in this pdf
Fix one factor (mainplot), vary a second factor (subplot)
Example: mainplot=land with irrigation system, subplot=fertilizer type
With R, use formula with nested factors:
aov(Y ~ main*nested + Error(grouping/nested))
or, with lme4: lmer(Y~ main*nested + (1|grouping), data=Sh)
What is wrong if you don't take into account the design?
If ignore Subject: you ignore the fact that observations are not true replicate, but come from the same person => wrong inference
If Subject as blocking factor: you cannot estimate your model (over-parametrized).
That's why Subject appears as a random effect!
Does the type of surgery have an effect?
3 types of pizzas in 6 different packaging. What kind of experiment could you make?
A chemical plant produces oxygen with some process. We want to find the optimal pressure and temperature.
Like a factorial design, but we want to apply it sequentially to find the maximum.
Steepest ascent model (First-order): \(Y = \mu + A_{temp} + B_{pressure} + \epsilon\)
From your point, find the direction in which the response changes the most. Move in this direction and repeat.
We zoom on a zone and approximate it with a plane. Then move in the direction of steepest ascent, etc.
You must enable Javascript to view this page properly.
Often we assume that data are balanced. But in reality often you have missing values, or incomplete design, etc.
Such design are said to be non-orthogonal. What does it mean? Why does it matter?
\(X_1\) and \(X_2\) are orthogonal if: \(\sum_i X_{1i} X_{2i}=0\).
\(X_1\) | \(X_2\) | Y |
---|---|---|
1 | 1 | 3 |
1 | -1 | 4 |
-1 | 1 | 2 |
-1 | -1 | 3 |
When the design is non-orthogonal, the effects cannot be estimated independently…
Think about linear regression:
If all X are independent, muliple regression is equivalent to \(p\) univariate regression.
If not, then things change…
\(SS\) type I and III are the same with orthogonal design
with non-orthogonal design not anymore.
In particular, type I \(SS\) depends on the order: aov(y~A+B) != aov(y~B+A)
type III \(SS\) doesn't depend on the order, but the usual \(SS\) decomposition is lost… with R: drop1(fit)
.
See this link for a detailed example
Which of the following is True/False?
If P=.05, the null hypothesis has only a 5% chance of being true
A nonsignificant difference (eg, P=.05) means there is no difference between groups
If P=0.05, even if there is no difference (the null is true), there is a 5% chance to observe a difference such as the one in the data.
If an effect has a P of 0.00001 it is very important for the problem at hand
Reference: (A dirty dozen: twelve p-value misconceptions)[http://www.ncbi.nlm.nih.gov/pubmed/18582619]
open book
everything is tested: ANOVA and experimental design
Some basic calculations by hand
Don't stress out: most subtasks are independent of each other. If you're stuck at one of them, continue to the next one. If you need a result that you didn't obtain, assume a fictive value and move on.
Good luck!
Question hour: Th. 15.01.2015 from 2 to 3pm in HG G26.3
Remark: please remember the TA evaluation form