[R] ANOVA and Pseudoreplication in R

Fri Feb 25 18:08:57 CET 2011

Hi, As part of my dissertation, I'm going to be doing an Anova, 
comparing the "dead zone" diameters on plates of microbial growth with 
little paper disks "loaded" with antimicrobial, a clear zone appears 
where death occurs, the size depending on the strength and 
succeptibility. So it's basically 4 different treatments, and I'm 
comparing the diameters (in mm) of circles. I'm concerned however, about 
Pseudoreplication and how to deal with it in R, (I thought of using the 
Error() term.

I have four levels of one factor(called "Treatment"): NE.Dettol, 
EV.Dettol, NE.Garlic, EV.Garlic.   ("NE.Dettol" is E.coli not evolved to 
dettol, exposed to dettol to get "dead zones". And the same for 
NE.Garlic, but with garlic, not dettol. "EV.Dettol" is E.coli that has 
been evolved against dettol, and then tested afterwards against dettol 
to get the "dead zones". Same applies for "EV.Garlic" but with garlic).  
You see from the four levels (or treatments) there are two chemicals 
involved. So my first concern is whether they should be analysed using 
two seperate ANOVA's.

NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli, 
just exposed to two different chemicals.
EV.Dettol and EV.Garlic, are in principle, likely to be two different 
forms of the organism after the many experimental doses of their 
respective chemical.

For NE.Garlic and NE.Dettol I have 5, what I've called "Lineages", 
basically seperate bottles of them (10 in total).
Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. - 
This was done because there was the possiblity that, whilst I'm 
expecting them all to respond in a similar manner, there are many 
evolutionary paths to the same result, and previous research and reading 
shows that occasionally one or two react differently to the rest through 
random chance.
The point I observed above ("NE.Dettol and NE.Garlic are both the same 
organism...") is also applicable to the 5 bottles: The 5 bottles each of 
NE.Garlic and NE.Dettol are supposed to be all the same organism - from 
a stock one kept in store in the lab.
There is potential though for the 5 of EV.Garlic, to be different from 
one another, and potential for the 5 EV.Dettol to be different from one 
another.

The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5). 
Because they may be different.

To get the measurements of the diamter of the zones. I take out a small 
amount from a tube and spread it on a plate, then take three paper 
disks, soaked in their respective chemical, either Dettol or Garlic. and 
press them and and incubate them.
Then when the zones have appeared after a day or 2. I take 4 diameter 
measurements from each zone, across the zone at different angles, to 
take account for the fact, that there may be a weird shape, or not quite 
circular.

I'm concerned about pseudoreplication, such as the multiple readings 
from one disk, and the 5 lineages - which might be different from one 
another in each of the Two "EV." treatments, but not with "NE." treatments.

I read that I can remove pseudoreplication from  the multiple readings 
from each disk, by using the 4 readings on each disk, to produce a mean 
for the disks, and analyse those means - Exerciseing caution where there 
are extreme values. I think the 3 disks for each lineage themselves are 
not pseudoreplication, because they are genuinley 3 disks on a plate: 
the "Disk Diffusion Test" replicated 3 times - but the multiple readings 
from one disk if eel, is pseudoreplication. I've also read about 
including Error() terms in a formula.

I'm unsure of the two NE. Treatments comming from the same culture does 
not introduce pseudoreplications at Treatment Factor Level, because of 
the two different antimicrobials used have two different effects.

I was hoping for a more expert opinion on whether I have identified 
pseudoreplication correctly or if there is indeed pseudoreplication in 
the 5 Lineages or anywhere else I haven't seen. And how best this is 
dealt with in R. At the minute my solution to the multiple readings from 
one disk is to simply make a new factor, with the means on and do Anova 
from that, or even take the means before I even load the dataset into R. 
I'm wondering if an Error() term would be correct.

Thanks,
Ben W.