[BioC] Harsh results using limma!

Fri Aug 13 14:36:18 CEST 2004

Hi Gordon

Yes you're right.  I didn't really mean to compare limma to a t-test.
It's just that the results are very consistent within technical
replicates (the dye-swaps), just not consistent between biological
replicates.  But this is the situation we expect - technical replicates
highly correlated and biological replicates much less so.  Clearly
differences of 0.2 could be noise, but my due-swaps BOTH came up with
0.2.  If I had ten replicate dye-swaps, all with 0.2 as the log(ratio)
would we still call this noise?   Given that the other replicate
experiments were also highly reproducible, I can't help but think this
gene is differentially expressed.

I know why limma and t-test disregard this gene, I just still think it
is a little harsh and that I am "throwing the baby away with the
bathwater", as it were.  

Mick

-----Original Message-----
From: Gordon Smyth [mailto:smyth at wehi.edu.au] 
Sent: 13 August 2004 12:56
To: michael watson (IAH-C)
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Harsh results using limma!

At 09:14 PM 13/08/2004, michael watson (IAH-C) wrote:
>Hi
>
>Firstly, I think limma is excellent and use it a lot, but some recent 
>results are a bit, erm, disappointing and I wondered if someone could 
>explain them.
>
>Basic set up was a double dye-swap experiment (4 arrays) involving 
>different animals, one infected with one type of bacterium and the 
>other a different bacterium, compared to one another directly.  I used 
>limma to analyse this and got a list of genes differentially regulated 
>- great!
>
>THEN another replicate experiment was performed (so now I have 6 
>arrays, 3 dye-swaps), and I re-did the analysis and my set of genes was

>completely different - but that's fine, we can put that down to 
>biological variation.  We know limma likes genes which show consistent 
>results across arrays, and when I looked at my data, I found that the 
>genes in my original list were not consistent across all six arrays.  
>So I am reasonably happy about this.
>
>My question comes from looking at the top gene from my old list in the 
>context of all six arrays.  Here are the normalised log ratios across 
>all six arrays (ds indicates the dye-swap):
>
>Gene1
>Exp1            -5.27
>Exp1ds  6.29
>Exp2            -4.61
>Exp2ds  5.54
>Exp3            -0.2
>Exp3ds  0.2

Changes of +-0.2 are tiny and look like pure noise. So, you can have a
gene 
for which only 2/3 of your mice show a difference. Statistical methods 
based on means and standard deviations will always judge this situation 
harshly. If you try an ordinary t-test rather than the limma method,
you'll 
find that this gene would be judged much more harshly again.

Gordon

>Not suprisingly, limma put this as the top gene when looking at the 
>first four arrays.  However, when looking across all six arrays, limma 
>places it at 230 in the list with a p-value of 0.11 (previously the 
>p-value was 0.0004).
>
>So finally we get to my point/question - does this gene really 
>"deserve" a p-value of 0.11 (ie not significant)?  In every case the 
>dye-flips are the correct way round, it is only the magnitude of the 
>log(ratio) which differs - and as we are talking about BIOLOGICAL 
>variation here, don't we expect the magnitude to change?  If we are 
>taking into account biological variation, surely we can't realistically
expect consistent
>ratios across all replicate experiments??   Isn't limma being a little
>harsh here?  After all the average log ratio is -3.7 (taking into 
>account the dye-flips) - and to me, experiment 3's results still 
>support the idea of the gene being differentially expressed, and are 
>even consistent within that biological replicate.
>
>Clearly I am looking at this data from a biologists point of view and 
>not a statisticians.  But we are studying biology, not statistics, and 
>I can't help feel I am missing out on something important here if I 
>disregard this gene as not significantly differentially expressed (NB 
>this is just the first example, there are many others).
>
>I should also add that there appears nothing strange about the arrays 
>for Experiment 3 - the distribution of log(ratio) for those arrays is 
>pretty much the same as the other four, so this is not an array-effect,

>it is an effect due to natural biological variation.
>
>Comments, questions, criticisms all welcome :-)
>
>Mick