[BioC] Very low P-values in limma
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Oct 29 07:50:30 CET 2009
Is it possible that you haven't quoted your professor verbatim?, because
these comments don't make sense as they stand. I really know what he
might mean by real p-values or the assumption of zero measurement error.
Measurement error obviously can't be zero. Nor can there be an infinite
number of replicates. None of the alternative analysis methods we have
discussed makes either of these assumptions. I wasn't the one arguing for
averaging within-array replicates, but if that method did assume what you
say, then it would have to be an invalid method.
On the other hand, your professor is quite right to say that within-array
replicates measure technical rather than biological variability. In a
univariate analysis, one would simply average the technical replicates.
This would give a summary reponse variable, with a variance made up of
both biological and technical components, with replicates that you could
reasonably treat as independent.
In a genewise microarray analysis, averaging the within-replicates has a
disadvantage in that it fails to penalize (lower the rank of) genes which
have high within-array variability. If biological variability is high
compared to technical, and you have a enough array replicates to get a
decent estimate of between-array variability, then averaging the
within-array replicates is likely still the way to go, just as in a
univariate analysis. On the other hand, if technical variability (within
and between arrays) is relatively large compared to biological, and the
number of array replicates is very small, then the information in the
within-array variances can be too valuable to ignore.
duplicateCorrelation uses the fact that the between-array variance has a
technical as well as a biological component, and the between and within
technical components tend to be associated across probes for many
microarray platforms. It is this last assumption which allows us to make
use of within-array standard deviations when making inferences about
between sample comparisons.
If your priority is to get reliable p-values, and you think you have
enough array replication to do this, then average the within-array
replicates. If your array replication is limited, technical variability
is high, and your priority is to rank the genes, then duplicateCorrelation
may help. I would add that microarray p-values should always be taken
with a grain of salt, as it's impossible to verify all assumptions in
small experiments, and it's useful instead to think in terms of
independent verification of the results.
This is really as far as I want to debate it. Obviously it's your
analysis and you should use your own judgement. As a maths graduate
student, you would be able to read the duplicateCorrelation published
paper if you want to check the reasoning in detail.
On Wed, 28 Oct 2009, Paul Geeleher wrote:
> Dear list,
> The following are the words of a professor in my department:
> I still don't get why the 'real' p-values could be better than
> p-values you get with the assumption of zero measurement error. By
> averaging over within array replicates you are not ignoring the within
> array replicates, instead you are acting as though there were
> infinitely many of them, so that the standard error of the expression
> level within array is zero. Stats is about making inferences about
> populations from finite samples. The population you are making
> inferences about is the population of all late-stage breast cancers.
> The data are from 7 individuals. The within-array replicates give an
> indication of measurement error of the expression levels but don't
> give you a handle on the variability of the quantity of interest in
> the population.
> On Sat, Oct 24, 2009 at 2:44 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> On Sat, 24 Oct 2009, Gordon K Smyth wrote:
>>> Dear Paul,
>>> Give your consensus correlation value, limma is treating your within-array
>>> replicates as worth about 1/3 as much as replicates on independent arrays
>>> (because 1-0.81^2 is about 1/3).
>> Sorry, my maths is wrong. The effective weight of the within-array
>> replicates is quite a bit less than 1/3, given ndups=4 and cor=0.81.
>> Best wishes
> Paul Geeleher
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
More information about the Bioconductor