[BioC] [Bioc-devel] technical and biological replicates in the same Exprset - Agi4x44
James W. MacDonald
jmacdon at uw.edu
Tue May 8 17:30:25 CEST 2012
First, note that you sent this to the wrong list. Bioc-devel is for
developers of BioC packages, not questions about how to use them. The
correct list is Bioc-help, where I have re-routed the thread.
On 5/8/2012 10:02 AM, Paola Sgadò wrote:
> HI all,
> I'm having some problem with microarray analysis. I am a biologist not very good with R neither with statistics!
> I'm using Agilent 4x44 arrays and the Agi4x44Processed package. I have basically to compare WT vs KO data. The microarray was done first with 3 true biological replicates and later with 4 technical replicates with a pool of RNAs.
> My design is the following:
> FileName Treat GErep Subject Array Repl.
> 549_1_4.txt KO 2 genotype 1 KO1
> 550_1_4.txt KO 2 genotype 2 KO2
> 551_1_4.txt KO 2 genotype 3 KO3
> 549_1_3.txt WT 1 genotype 1 WT1
> 550_1_3.txt WT 1 genotype 2 WT2
> 551_1_3.txt WT 1 genotype 3 WT3
> 385_1_1.txt WT 3 genotype 4 WT4
> 385_1_2.txt KO 4 genotype 4 KO4
> 385_1_3.txt WT 3 genotype 4 WT4
> 385_1_4.txt KO 4 genotype 4 KO4
> 386_1_2.txt WT 3 genotype 5 WT4
> 386_1_3.txt KO 4 genotype 5 KO4
> 386_1_4.txt WT 3 genotype 5 WT4
> I performed normalization and filtering with the entire set of arrays, but when I started the statistical analysis using ebayes with limma I realized I could not treat biological (WT1,2,3-KO1,2,3) and technical replicates (WT4-KO4) the same way.
> I tried to use the dupcor function, but it does not work with tech and biol replicates in the same analysis. Is there a way to bypass the problem?
> Thanks for your help, I really cannot find the way out....
There is a famous quote by Sir Ronald Fisher that may well apply here:
"To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of."
You are correct that you cannot use biological and technical replicates
the same way. Nor can you treat single samples and pools the same way
(the pool itself is biologically 'smoothed', so the expected variance
will be lower for a pool than for a sample from a single subject).
So you have bad choices and worse choices. In order from bad to
1.) Exclude all pooled samples. This is bad because you just wasted all
those arrays, but is the easiest to defend if you try to publish.
2.) Exclude all but one each of pooled WT and KO. This is bad for the
reason above, plus you are assuming that a pool is the same as a single
sample. Sort of hard to explain in a paper as well.
3.) Use all the data, pretending that they are all biological
replicates. Really hard to defend, and the gain in power from increasing
N will likely be offset by the fact that the signal from all those
pooled technical replicates won't actually be signal, but noise (any
differences between technical replicates cannot possibly be due to
biological differences, hence is only noise).
In the end you will have to validate any targets that arise from the
microarray experiment, so really what you are trying to do is minimize
spurious results that cause you to waste time running RT-PCR on genes
that aren't differentially expressed.
> [[alternative HTML version deleted]]
> Bioc-devel at r-project.org mailing list
James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor