[BioC] Limma: correct calculation of B statistics (log odds)
Gordon Smyth
smyth at wehi.EDU.AU
Fri Apr 21 02:02:26 CEST 2006
Dear Ben,
Please see also my longer reply to Jose in a separate email.
The t-statistics, p-values and gene rankings provided by limma do not
depend on the assumed proportion. In fact part of the motivation for
developing the moderated t-statistics was to obtain a statistic with
the same power as the posterior odds without needing this
difficult-to-estimate quantity.
While the B-statistic does depend on the prior assumed proportion,
this is dependence is very straightforward, well understand and
explicit. The prior log-odds simply adds a constant to all the
genewise B-statistics. It doesn't change the ordering.
I agree with your desire to avoid dependence on unjustified
assumptions. My approach in limma has been to minimise assumptions
where possible but otherwise to make the assumptions very explicit.
What I personally feel uneasy about are statistical methods which
propose to estimate quantities about which the data contains very
little information. The dependence on assumptions may be hard to see.
It seems to me that the proportion of DE genes is just such a
quantity, because its estimation must be highly sensitive to model
assumptions in small microarray experiments. I could easily provide
an automatic estimate of this quantity as part of the eBayes()
computations in limma, but I deliberately chose not to do this.
Expanding a little further on this topic, it seems to me that a
biologically meaningful treatment of the proportion of truly DE genes
would require a more careful definition of the concept of
differential expression than has so far appeared in the literature.
It seems to me that mathematicians and biologists have different
things in mind when they think of this quantity. Mathematicians are
including many genes with very small fold changes which the
biologists would do not consider of interest. A biologically
meaningful treatment would have to specify how large a fold change
needs to be in order to be considered material. I suspect that
biologists are going to be surprised by how sensitive the estimated
proportion is to this threshold.
Best wishes
Gordon
>
>Jose,
>
>I'm very glad you asked this question. One of the things that has made me wary
>of using limma is that the proportion of differentially expressed
>genes is often
>one of the primary things I'm trying to discover from the data, so I
>feel uneasy
>making an assumption as to what that proportion is. In your email
>below, you say
>that the output of limma is sensitive to the assumption, which, of
>course, makes
>me feel even more uneasy about it.
>I've not noticed any responses on the BioC list. Has anyone commented on this
>issue to you?
>
>-Ben
