[BioC] Harsh results using limma!
Gordon Smyth
smyth at wehi.edu.au
Fri Aug 13 13:56:05 CEST 2004
At 09:14 PM 13/08/2004, michael watson (IAH-C) wrote:
>Hi
>
>Firstly, I think limma is excellent and use it a lot, but some recent
>results are a bit, erm, disappointing and I wondered if someone could
>explain them.
>
>Basic set up was a double dye-swap experiment (4 arrays) involving
>different animals, one infected with one type of bacterium and the other
>a different bacterium, compared to one another directly. I used limma
>to analyse this and got a list of genes differentially regulated -
>great!
>
>THEN another replicate experiment was performed (so now I have 6 arrays,
>3 dye-swaps), and I re-did the analysis and my set of genes was
>completely different - but that's fine, we can put that down to
>biological variation. We know limma likes genes which show consistent
>results across arrays, and when I looked at my data, I found that the
>genes in my original list were not consistent across all six arrays. So
>I am reasonably happy about this.
>
>My question comes from looking at the top gene from my old list in the
>context of all six arrays. Here are the normalised log ratios across
>all six arrays (ds indicates the dye-swap):
>
>Gene1
>Exp1 -5.27
>Exp1ds 6.29
>Exp2 -4.61
>Exp2ds 5.54
>Exp3 -0.2
>Exp3ds 0.2
Changes of +-0.2 are tiny and look like pure noise. So, you can have a gene
for which only 2/3 of your mice show a difference. Statistical methods
based on means and standard deviations will always judge this situation
harshly. If you try an ordinary t-test rather than the limma method, you'll
find that this gene would be judged much more harshly again.
Gordon
>Not suprisingly, limma put this as the top gene when looking at the
>first four arrays. However, when looking across all six arrays, limma
>places it at 230 in the list with a p-value of 0.11 (previously the
>p-value was 0.0004).
>
>So finally we get to my point/question - does this gene really "deserve"
>a p-value of 0.11 (ie not significant)? In every case the dye-flips are
>the correct way round, it is only the magnitude of the log(ratio) which
>differs - and as we are talking about BIOLOGICAL variation here, don't
>we expect the magnitude to change? If we are taking into account
>biological variation, surely we can't realistically expect consistent
>ratios across all replicate experiments?? Isn't limma being a little
>harsh here? After all the average log ratio is -3.7 (taking into
>account the dye-flips) - and to me, experiment 3's results still support
>the idea of the gene being differentially expressed, and are even
>consistent within that biological replicate.
>
>Clearly I am looking at this data from a biologists point of view and
>not a statisticians. But we are studying biology, not statistics, and I
>can't help feel I am missing out on something important here if I
>disregard this gene as not significantly differentially expressed (NB
>this is just the first example, there are many others).
>
>I should also add that there appears nothing strange about the arrays
>for Experiment 3 - the distribution of log(ratio) for those arrays is
>pretty much the same as the other four, so this is not an array-effect,
>it is an effect due to natural biological variation.
>
>Comments, questions, criticisms all welcome :-)
>
>Mick
More information about the Bioconductor
mailing list