[BioC] Very low P-values in limma

Wed Oct 28 15:11:41 CET 2009

Hi Paul!

What can I say? Your professor is right! In one of his e-mails Gordon wrote the following:

" There are some caveats.  Firstly, the limma duplicateCorrelation method assumes replicates to be equally spaced, and yours are not.  You've actually re-ordered your data to fit it into limma.  So the within-array correlation will be underestimated, and significance over-estimated, for transcripts for which the replicates are unusually close on the array.
Secondly, the within-array correlation is assumed to be the same for all transcripts, which is never actually true.  The approximation has proved worthwhile when ndups=2 or 3, but it will yield over-optimistic results when the number of within-replicates is large."

That probably gives you a hint, how using limma in this situation might give you "too good" results. As much as it might be an interesting exercise to find out exactly at which point your data are violating the assumptions limma makes (or what else is strange her), I would choose the pragmatic solution to simply average across duplicates first and then apply limma. As your colleague says you give up information about technical variability here, but as far as the comparison between the two groups is concerned this is an absolutely valid approach and as your p-values seem to be less significant you are on the "safe side" regarding control of FDR etc...

Claus

> -----Original Message-----
> From: Paul Geeleher [mailto:paulgeeleher at gmail.com]
> Sent: 28 October 2009 13:24
> To: Gordon K Smyth
> Cc: Mayer, Claus-Dieter; Bioconductor mailing list
> Subject: Re: [BioC] Very low P-values in limma
>
> Dear list,
>
> The following are the words of a professor in my department:
>
> I still don't get why the 'real' p-values could be better than
> p-values you get with the assumption of zero measurement error. By
> averaging over within array replicates you are not ignoring the within
> array replicates, instead you are acting as though there were
> infinitely many of them, so that the standard error of the expression
> level within array is zero. Stats is about making inferences about
> populations from finite samples. The population you are making
> inferences about is the population of all late-stage breast cancers.
> The data are from 7 individuals. The within-array replicates give an
> indication of measurement error of the expression levels but don't
> give you a handle on the variability of the quantity of interest in
> the population.
>
> Paul
>
> On Sat, Oct 24, 2009 at 2:44 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> >
> >
> > On Sat, 24 Oct 2009, Gordon K Smyth wrote:
> >
> >> Dear Paul,
> >>
> >> Give your consensus correlation value, limma is treating your within-
> array
> >> replicates as worth about 1/3 as much as replicates on independent
> arrays
> >> (because 1-0.81^2 is about 1/3).
> >
> > Sorry, my maths is wrong.  The effective weight of the within-array
> > replicates is quite a bit less than 1/3, given ndups=4 and cor=0.81.
> >
> > Best wishes
> > Gordon
> >
>
>
>
> --
> Paul Geeleher
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
> Galway
> Ireland

The University of Aberdeen is a charity registered in Scotland, No SC013683.