[BioC] Harsh results using limma!

Fri Aug 13 13:14:47 CEST 2004

Hi

Firstly, I think limma is excellent and use it a lot, but some recent
results are a bit, erm, disappointing and I wondered if someone could
explain them.

Basic set up was a double dye-swap experiment (4 arrays) involving
different animals, one infected with one type of bacterium and the other
a different bacterium, compared to one another directly.  I used limma
to analyse this and got a list of genes differentially regulated -
great!

THEN another replicate experiment was performed (so now I have 6 arrays,
3 dye-swaps), and I re-did the analysis and my set of genes was
completely different - but that's fine, we can put that down to
biological variation.  We know limma likes genes which show consistent
results across arrays, and when I looked at my data, I found that the
genes in my original list were not consistent across all six arrays.  So
I am reasonably happy about this.  

My question comes from looking at the top gene from my old list in the
context of all six arrays.  Here are the normalised log ratios across
all six arrays (ds indicates the dye-swap):

Gene1
Exp1		-5.27
Exp1ds	6.29
Exp2		-4.61
Exp2ds	5.54
Exp3		-0.2
Exp3ds	0.2

Not suprisingly, limma put this as the top gene when looking at the
first four arrays.  However, when looking across all six arrays, limma
places it at 230 in the list with a p-value of 0.11 (previously the
p-value was 0.0004).  

So finally we get to my point/question - does this gene really "deserve"
a p-value of 0.11 (ie not significant)?  In every case the dye-flips are
the correct way round, it is only the magnitude of the log(ratio) which
differs - and as we are talking about BIOLOGICAL variation here, don't
we expect the magnitude to change?  If we are taking into account
biological variation, surely we can't realistically expect consistent
ratios across all replicate experiments??   Isn't limma being a little
harsh here?  After all the average log ratio is -3.7 (taking into
account the dye-flips) - and to me, experiment 3's results still support
the idea of the gene being differentially expressed, and are even
consistent within that biological replicate.

Clearly I am looking at this data from a biologists point of view and
not a statisticians.  But we are studying biology, not statistics, and I
can't help feel I am missing out on something important here if I
disregard this gene as not significantly differentially expressed (NB
this is just the first example, there are many others). 

I should also add that there appears nothing strange about the arrays
for Experiment 3 - the distribution of log(ratio) for those arrays is
pretty much the same as the other four, so this is not an array-effect,
it is an effect due to natural biological variation.

Comments, questions, criticisms all welcome :-)

Mick