[R-sig-ME] Fwd: Re: [R] ANOVA and Pseudoreplication in R

Sun Feb 27 23:24:12 CET 2011

[If in an experimental design treatments are applied to whole plots, 
while observations are made for an equal number of subplots in
each plot, then a standard analysis based on plot means is exactly
equivalent to an analysis that has a random plot effect.  If  the number
of subplots per plot is not too unequal, then the two analyses will
be pretty much equivalent.  Or issues of heterogeneity of variance
may make differences in numbers of subplots per plots an issue
of more minor consequence.]

[These considerations apply far more widely.  Taking plot or suchlike 
means may greatly simplify the calculations.  Or use of a summary
statistic other than the mean, maybe the median, may sometimes be
appropriate.  So I do not think it quite accurate to describe the idea
of taking means, in circumstances not unlike Ben's, as baloney!]

In Ben's case, taking means for the disks, but checking that the mean 
really is the appropriate summary statistic, is an entirely reasonable 
way to proceed.  That leads to a simplified data set for the further
analysis.

I have sympathy for Ben's difficulty in giving a description that is
clear and complete enough to allow sensible advice.  In a project
that has some element of novelty, the design parameters are typically
not well enough understood to give cogent reasons for choosing one
design rather than another.  Even with the best available sources of
advice, the first experiment should usually be treated as to a greater
or smaller extent exploratory.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 27/02/2011, at 12:31 PM, Bert Gunter wrote:

> Hi Ben:
> 
> 1) Confession: I did not and have not read your post in detail.
> 
> 2) IMHO, the following advice:
> 
>> "Pseudo replication is really about a lack of independence between
>> measurements, So you need to work backwards and see where you are building
>> in a known lack of independence.  And where that is the case you need to use
>> means of all the values."
> 
> is baloney. The "lack of independence" part is correct. The baloney
> part is "take the mean." That is exactly where mixed models -- or
> hierarchical modeling of some sort -- is required. That's why I
> referred you to R-sig-mixed-models.
> 
> Again, please note: IMHO. Maybe I'm the one full of baloney. It's free
> advice, after all, so beware of what you get.
> 
> Cheers,
> Bert
> 
> 
> On Sat, Feb 26, 2011 at 7:44 AM, Ben Ward <benjamin.ward at bathspa.org> wrote:
>> On 25/02/2011 21:22, Ben Ward wrote:
>>> 
>>> -------- Original Message --------
>>> Subject:        Re: [R] ANOVA and Pseudoreplication in R
>>> Date:   Fri, 25 Feb 2011 12:10:14 -0800
>>> From:   Bert Gunter<gunter.berton at gene.com>
>>> To:     Ben Ward<benjamin.ward at bathspa.org>
>>> CC:     r-help<r-help at r-project.org>
>>> 
>>> 
>>> 
>>> I can hopefully save bandwidth here by suggesting that this belongs on
>>> the R-sig-mixed-models list.
>>> 
>>> -- Bert
>>> 
>>> As an aside, shouldn't you be figuring this out yourself or seeking local
>>> consulting expertise?
>> 
>> I did consult with the lecturer at university that knows most about stats,
>> and he advised me:
>> 
>> "Pseudo replication is really about a lack of independence between
>> measurements, So you need to work backwards and see where you are building
>> in a known lack of independence.  And where that is the case you need to use
>> means of all the values."
>> 
>> 
>> And I have done this and came to the conclusion I mentioned as to where I
>> thought Pseudoreplicaton was comming from, however, I do not know about the
>> one other 'potential' source as it really is for me at least, a grey area.
>> I've consulted a few forums that deal with the theory more and await any
>> response. Until then I'll have to try and get as many opinions on it as
>> possible.
>> 
>> -Ben W.
>> 
>>> On Fri, Feb 25, 2011 at 9:08 AM, Ben Ward<benjamin.ward at bathspa.org>
>>> wrote:
>>>> 
>>>> Hi, As part of my dissertation, I'm going to be doing an Anova,
>>>> comparing
>>>> the "dead zone" diameters on plates of microbial growth with little
>>>> paper
>>>> disks "loaded" with antimicrobial, a clear zone appears where death
>>>> occurs,
>>>> the size depending on the strength and succeptibility. So it's basically
>>>> 4
>>>> different treatments, and I'm comparing the diameters (in mm) of
>>>> circles.
>>>> I'm concerned however, about Pseudoreplication and how to deal with it
>>>> in R,
>>>> (I thought of using the Error() term.
>>>> 
>>>> I have four levels of one factor(called "Treatment"): NE.Dettol,
>>>> EV.Dettol,
>>>> NE.Garlic, EV.Garlic.   ("NE.Dettol" is E.coli not evolved to dettol,
>>>> exposed to dettol to get "dead zones". And the same for NE.Garlic, but
>>>> with
>>>> garlic, not dettol. "EV.Dettol" is E.coli that has been evolved against
>>>> dettol, and then tested afterwards against dettol to get the "dead
>>>> zones".
>>>> Same applies for "EV.Garlic" but with garlic).  You see from the four
>>>> levels
>>>> (or treatments) there are two chemicals involved. So my first concern is
>>>> whether they should be analysed using two seperate ANOVA's.
>>>> 
>>>> NE.Dettol and NE.Garlic are both the same organism - a lab stock E.coli,
>>>> just exposed to two different chemicals.
>>>> EV.Dettol and EV.Garlic, are in principle, likely to be two different
>>>> forms
>>>> of the organism after the many experimental doses of their respective
>>>> chemical.
>>>> 
>>>> For NE.Garlic and NE.Dettol I have 5, what I've called "Lineages",
>>>> basically
>>>> seperate bottles of them (10 in total).
>>>> Then I have 5 Bottles (Lineages) of EV.Dettol, and 5 of EV.Garlic. -
>>>> This
>>>> was done because there was the possiblity that, whilst I'm expecting
>>>> them
>>>> all to respond in a similar manner, there are many evolutionary paths to
>>>> the
>>>> same result, and previous research and reading shows that occasionally
>>>> one
>>>> or two react differently to the rest through random chance.
>>>> The point I observed above ("NE.Dettol and NE.Garlic are both the same
>>>> organism...") is also applicable to the 5 bottles: The 5 bottles each of
>>>> NE.Garlic and NE.Dettol are supposed to be all the same organism - from
>>>> a
>>>> stock one kept in store in the lab.
>>>> There is potential though for the 5 of EV.Garlic, to be different from
>>>> one
>>>> another, and potential for the 5 EV.Dettol to be different from one
>>>> another.
>>>> 
>>>> The Lineage (bottle) is also a factor then, with 5 levels (1,2,3,4,5).
>>>> Because they may be different.
>>>> 
>>>> To get the measurements of the diamter of the zones. I take out a small
>>>> amount from a tube and spread it on a plate, then take three paper
>>>> disks,
>>>> soaked in their respective chemical, either Dettol or Garlic. and press
>>>> them
>>>> and and incubate them.
>>>> Then when the zones have appeared after a day or 2. I take 4 diameter
>>>> measurements from each zone, across the zone at different angles, to
>>>> take
>>>> account for the fact, that there may be a weird shape, or not quite
>>>> circular.
>>>> 
>>>> I'm concerned about pseudoreplication, such as the multiple readings
>>>> from
>>>> one disk, and the 5 lineages - which might be different from one another
>>>> in
>>>> each of the Two "EV." treatments, but not with "NE." treatments.
>>>> 
>>>> I read that I can remove pseudoreplication from  the multiple readings
>>>> from
>>>> each disk, by using the 4 readings on each disk, to produce a mean for
>>>> the
>>>> disks, and analyse those means - Exerciseing caution where there are
>>>> extreme
>>>> values. I think the 3 disks for each lineage themselves are not
>>>> pseudoreplication, because they are genuinley 3 disks on a plate: the
>>>> "Disk
>>>> Diffusion Test" replicated 3 times - but the multiple readings from one
>>>> disk
>>>> if eel, is pseudoreplication. I've also read about including Error()
>>>> terms
>>>> in a formula.
>>>> 
>>>> I'm unsure of the two NE. Treatments comming from the same culture does
>>>> not
>>>> introduce pseudoreplications at Treatment Factor Level, because of the
>>>> two
>>>> different antimicrobials used have two different effects.
>>>> 
>>>> I was hoping for a more expert opinion on whether I have identified
>>>> pseudoreplication correctly or if there is indeed pseudoreplication in
>>>> the 5
>>>> Lineages or anywhere else I haven't seen. And how best this is dealt
>>>> with in
>>>> R. At the minute my solution to the multiple readings from one disk is
>>>> to
>>>> simply make a new factor, with the means on and do Anova from that, or
>>>> even
>>>> take the means before I even load the dataset into R. I'm wondering if
>>>> an
>>>> Error() term would be correct.
>>>> 
>>>> Thanks,
>>>> Ben W.
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models