[BioC] Very low P-values in limma

Thu Oct 29 13:13:56 CET 2009

Dear Paul,

to see what's going on, it may be helpful to step away from Limma for  
a moment, and think of a case-control experiment with one array per  
subject in the context of standard mixed-effects ANOVA. The model for  
a gene will look like that:

log(intensity) = GrandMean + Group{Fixed} + Array(Group){Random,  
confounded with Subject} + replicate(Array){Random}

With this model, you test the difference between the groups against  
the variation between arrays. In the special case of balanced design  
(i.e. an equal number of arrays per group, and an equal number of  
replicates per array), this test only involves averages of replicates  
in each array. In other words, using this model will be equivalent to  
using the model

log(averageIntensityOverReplicates) = GrandMean + Group{Fixed} +  
Array(Group){Random, confounded with Subject}

after averaging the spots on an array. However the models are not  
equivalent if your design is unbalanced or if you have missing values.

The difficulty with the ANOVA is that, when you have a small number of  
arrays (=subjects) per group, the variance of Array(Group) is not  
estimated reliably in a single gene. In Limma, Gordon took a different  
approach, and estimated the variance of Array(Group) prior to fitting  
the model, using the information across all genes. The advantage is  
that the estimator uses more data, and is therefore more reliable. The  
disadvantages are that (1) it makes the assumption of a same variance  
of Array(Group) in all genes, and (2) the subsequent model fit assumes  
that this variance is known, and not estimated from the same data.

Therefore, I second Gordon's opinion: if you have enough arrays per  
group and a balanced design, than averaging is appropriate. If you  
have few arras but many technical replicates, estimating this variance  
using the duplicateCorrelation() may work best.

And one more comment: although it is true that in many cases we want  
to make inference about populations (e.g. patients) from a random  
sample of individuals, this is not always the case. Experiments with a  
small number of subjects may not contain enough information about the  
underlying population. In this case one may want to restrict the  
inference to gene expressions in these specific subjects, and only  
account for technical variation in the microarray measurements. This  
translates into the ANOVA model

log(intensity) = GrandMean + Group{Fixed} + Array(Group){Fixed,  
confounded with Subject} + replicate(Array){Random}

In Limma, this will amount to adding "Array" as an extra model term,  
and skipping duplicateCorrelation().

Best regards
Olga Vitek
Assistant Professor
Statistics and Computer Science
Purdue University

>
>
> ------------------------------
>
> Message: 18
> Date: Thu, 29 Oct 2009 17:50:30 +1100 (AUS Eastern Daylight Time)
> From: Gordon K Smyth <smyth at wehi.EDU.AU>
> Subject: Re: [BioC] Very low P-values in limma
> To: Paul Geeleher <paulgeeleher at gmail.com>
> Cc: "Mayer, Claus-Dieter" <c.mayer at abdn.ac.uk>,	Bioconductor mailing
> 	list <bioconductor at stat.math.ethz.ch>
> Message-ID: <Pine.WNT.4.64.0910291625370.4880 at PC602.alpha.wehi.edu.au>
> Content-Type: text/plain; charset="x-unknown"; Format="flowed"
>
> Dear Paul,
>
> Is it possible that you haven't quoted your professor verbatim?,  
> because
> these comments don't make sense as they stand.  I really know what he
> might mean by real p-values or the assumption of zero measurement  
> error.
> Measurement error obviously can't be zero.  Nor can there be an  
> infinite
> number of replicates.  None of the alternative analysis methods we  
> have
> discussed makes either of these assumptions.  I wasn't the one  
> arguing for
> averaging within-array replicates, but if that method did assume  
> what you
> say, then it would have to be an invalid method.
>
> On the other hand, your professor is quite right to say that within- 
> array
> replicates measure technical rather than biological variability.  In a
> univariate analysis, one would simply average the technical  
> replicates.
> This would give a summary reponse variable, with a variance made up of
> both biological and technical components, with replicates that you  
> could
> reasonably treat as independent.
>
> In a genewise microarray analysis, averaging the within-replicates  
> has a
> disadvantage in that it fails to penalize (lower the rank of) genes  
> which
> have high within-array variability.  If biological variability is high
> compared to technical, and you have a enough array replicates to get a
> decent estimate of between-array variability, then averaging the
> within-array replicates is likely still the way to go, just as in a
> univariate analysis.  On the other hand, if technical variability  
> (within
> and between arrays) is relatively large compared to biological, and  
> the
> number of array replicates is very small, then the information in the
> within-array variances can be too valuable to ignore.
> duplicateCorrelation uses the fact that the between-array variance  
> has a
> technical as well as a biological component, and the between and  
> within
> technical components tend to be associated across probes for many
> microarray platforms.  It is this last assumption which allows us to  
> make
> use of within-array standard deviations when making inferences about
> between sample comparisons.
>
> If your priority is to get reliable p-values, and you think you have
> enough array replication to do this, then average the within-array
> replicates.  If your array replication is limited, technical  
> variability
> is high, and your priority is to rank the genes, then  
> duplicateCorrelation
> may help.  I would add that microarray p-values should always be taken
> with a grain of salt, as it's impossible to verify all assumptions in
> small experiments, and it's useful instead to think in terms of
> independent verification of the results.
>
> This is really as far as I want to debate it.  Obviously it's your
> analysis and you should use your own judgement.  As a maths graduate
> student, you would be able to read the duplicateCorrelation published
> paper if you want to check the reasoning in detail.
>
> Best wishes
> Gordon
>
>
> On Wed, 28 Oct 2009, Paul Geeleher wrote:
>
>> Dear list,
>>
>> The following are the words of a professor in my department:
>>
>> I still don't get why the 'real' p-values could be better than
>> p-values you get with the assumption of zero measurement error. By
>> averaging over within array replicates you are not ignoring the  
>> within
>> array replicates, instead you are acting as though there were
>> infinitely many of them, so that the standard error of the expression
>> level within array is zero. Stats is about making inferences about
>> populations from finite samples. The population you are making
>> inferences about is the population of all late-stage breast cancers.
>> The data are from 7 individuals. The within-array replicates give an
>> indication of measurement error of the expression levels but don't
>> give you a handle on the variability of the quantity of interest in
>> the population.
>>
>> Paul
>>
>> On Sat, Oct 24, 2009 at 2:44 AM, Gordon K Smyth <smyth at wehi.edu.au>  
>> wrote:
>>>
>>>
>>> On Sat, 24 Oct 2009, Gordon K Smyth wrote:
>>>
>>>> Dear Paul,
>>>>
>>>> Give your consensus correlation value, limma is treating your  
>>>> within-array
>>>> replicates as worth about 1/3 as much as replicates on  
>>>> independent arrays
>>>> (because 1-0.81^2 is about 1/3).
>>>
>>> Sorry, my maths is wrong. ?The effective weight of the within-array
>>> replicates is quite a bit less than 1/3, given ndups=4 and cor=0.81.
>>>
>>> Best wishes
>>> Gordon
>>>
>>
>>
>>
>> -- 
>> Paul Geeleher
>> School of Mathematics, Statistics and Applied Mathematics
>> National University of Ireland
>> Galway
>> Ireland
>>