[BioC] Limma: correct calculation of B statistics (log odds)

Fri Apr 21 02:02:26 CEST 2006

Dear Ben,

Please see also my longer reply to Jose in a separate email.

The t-statistics, p-values and gene rankings provided by limma do not 
depend on the assumed proportion. In fact part of the motivation for 
developing the moderated t-statistics was to obtain a statistic with 
the same power as the posterior odds without needing this 
difficult-to-estimate quantity.

While the B-statistic does depend on the prior assumed proportion, 
this is dependence is very straightforward, well understand and 
explicit. The prior log-odds simply adds a constant to all the 
genewise B-statistics. It doesn't change the ordering.

I agree with your desire to avoid dependence on unjustified 
assumptions. My approach in limma has been to minimise assumptions 
where possible but otherwise to make the assumptions very explicit.

What I personally feel uneasy about are statistical methods which 
propose to estimate quantities about which the data contains very 
little information. The dependence on assumptions may be hard to see. 
It seems to me that the proportion of DE genes is just such a 
quantity, because its estimation must be highly sensitive to model 
assumptions in small microarray experiments. I could easily provide 
an automatic estimate of this quantity as part of the eBayes() 
computations in limma, but I deliberately chose not to do this.

Expanding a little further on this topic, it seems to me that a 
biologically meaningful treatment of the proportion of truly DE genes 
would require a more careful definition of the concept of 
differential expression than has so far appeared in the literature. 
It seems to me that mathematicians and biologists have different 
things in mind when they think of this quantity. Mathematicians are 
including many genes with very small fold changes which the 
biologists would do not consider of interest. A biologically 
meaningful treatment would have to specify how large a fold change 
needs to be in order to be considered material. I suspect that 
biologists are going to be surprised by how sensitive the estimated 
proportion is to this threshold.

Best wishes
Gordon

>[BioC] Limma: correct calculation of B statistics (log odds)
>Wittner, Ben, Ph.D. Wittner.Ben at mgh.harvard.edu
>Thu Apr 20 19:40:10 CEST 2006
>
>Jose,
>
>I'm very glad you asked this question. One of the things that has made me wary
>of using limma is that the proportion of differentially expressed 
>genes is often
>one of the primary things I'm trying to discover from the data, so I 
>feel uneasy
>making an assumption as to what that proportion is. In your email 
>below, you say
>that the output of limma is sensitive to the assumption, which, of 
>course, makes
>me feel even more uneasy about it.
>I've not noticed any responses on the BioC list. Has anyone commented on this
>issue to you?
>
>-Ben