[BioC] [Bioc-devel] technical and biological replicates in the same Exprset - Agi4x44

Tue May 8 17:30:25 CEST 2012

Hi Paola,

First, note that you sent this to the wrong list. Bioc-devel is for 
developers of BioC packages, not questions about how to use them. The 
correct list is Bioc-help, where I have re-routed the thread.

On 5/8/2012 10:02 AM, Paola Sgadò wrote:
> HI all,
> I'm having some problem with microarray analysis. I am a biologist not very good with R neither with statistics!
> I'm using Agilent 4x44 arrays and the Agi4x44Processed package. I have basically to compare WT vs KO data. The microarray was done first with 3 true biological replicates and later with 4 technical replicates with a pool of RNAs.
> My design is the following:
>> targets
> FileName	Treat	GErep	Subject	Array Repl.
> 549_1_4.txt        KO     2	genotype     1   KO1
> 550_1_4.txt        KO     2	genotype     2   KO2
> 551_1_4.txt        KO     2	genotype     3   KO3
> 549_1_3.txt        WT     1	genotype     1   WT1
> 550_1_3.txt        WT     1	genotype     2   WT2
> 551_1_3.txt        WT     1	genotype     3   WT3
> 385_1_1.txt        WT     3	genotype     4   WT4
> 385_1_2.txt        KO     4	genotype     4   KO4
> 385_1_3.txt        WT     3	genotype     4   WT4
> 385_1_4.txt        KO     4	genotype     4   KO4
> 386_1_2.txt        WT     3	genotype     5   WT4
> 386_1_3.txt        KO     4	genotype     5   KO4
> 386_1_4.txt        WT     3 	genotype     5   WT4
> I performed normalization and filtering with the entire set of arrays, but when I started the statistical analysis using ebayes with limma I realized I could not treat biological (WT1,2,3-KO1,2,3) and technical replicates (WT4-KO4) the same way.
> I tried to use the dupcor function, but it does not work with tech and biol replicates in the same analysis. Is there a way to bypass the problem?
> Thanks for your help, I really cannot find the way out....

There is a famous quote by Sir Ronald Fisher that may well apply here:

"To call in the statistician after the experiment is done may be no more 
than asking him to perform a post-mortem examination: he may be able to 
say what the experiment died of."

You are correct that you cannot use biological and technical replicates 
the same way. Nor can you treat single samples and pools the same way 
(the pool itself is biologically 'smoothed', so the expected variance 
will be lower for a pool than for a sample from a single subject).

So you have bad choices and worse choices. In order from bad to 
indefensible:

1.) Exclude all pooled samples. This is bad because you just wasted all 
those arrays, but is the easiest to defend if you try to publish.
2.) Exclude all but one each of pooled WT and KO. This is bad for the 
reason above, plus you are assuming that a pool is the same as a single 
sample. Sort of hard to explain in a paper as well.
3.) Use all the data, pretending that they are all biological 
replicates. Really hard to defend, and the gain in power from increasing 
N will likely be offset by the fact that the signal from all those 
pooled technical replicates won't actually be signal, but noise (any 
differences between technical replicates cannot possibly be due to 
biological differences, hence is only noise).

In the end you will have to validate any targets that arise from the 
microarray experiment, so really what you are trying to do is minimize 
spurious results that cause you to waste time running RT-PCR on genes 
that aren't differentially expressed.

Best,

Jim

> Cheers
> Paola
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099