[BioC] Limma: correct calculation of B statistics (log odds)

Fri Apr 21 16:23:19 CEST 2006

Dear Gordon,

many thanks for (as usual) a very helpful and informative answer.

> 2. The B-statistic is the log-posterior odds of differential
> expression. Naturally the posterior probability has to depend on the
> prior probability. However the ranking of the genes given by the
> B-statistic doesn't depend on the prior probability.

But you can't use the B statistic to decide a suitable cut off for your 
experiment unless the proportion of DE genes has been estimated. Since 
the adjusted P Values rank the genes in the same order as B... why use 
B at all?

Volcano plots (M vs B) are useful no matter how B is estimated, but 
they would be more meaningful if we could say something like "for B>3 
we can be pretty confident these genes are truly DE"

> 5. If you have a prior belief based on sound *biological* grounds
> that the overall proportion of DE genes should be substantially
> greater or less than 0.01, then you are permitted and encouraged to
> set the proportion for yourself.

I don't, really. I may believe that more genes are DE in some types of 
experiments than others, but I can't tell what the proportion should 
be, based on prior knowledge of the biology. Sometimes I expect "many" 
changes, and others "not so many", and the observations fit... but how 
many is many? I don't know. That's why I thought about maybe using FDR, 
but it seems that the issue is more complex than that (ah, it always 
is...)

> 6. If you really can't resist estimating the proportion of DE genes,
> the best method that I know of it that proposed by Ferkinstad et al
> (2005). This is implemented in limma in the convest() function. In
> limma you can use
>
>    p0 <- convest(fit$p.value[,"coefofinterest"])
>
> to estime the proportion of non-DE genes for your contrast of interest.
>
> Ferkingstad, E., Langaas, M., and Lindqvist, B. (2005). Estimating
> the proportion of true null hypotheses, with application
> to DNA microarray data. Journal of the Royal Statistical Society
> Series B, 67, 555-572.
> http://www.math.ntnu.no/~mettela/SFG/research.imf

thank you very much for that, Gordon.

> 7. Finally, note that the number of truly DE genes is likely to be
> quite a bit greater than the number of statistically significant DE
> genes. This is because there may be many genes which are DE but with
> such small fold changes that you have no realistic chance of
> detecting them. This fact has two consequences. Firstly it means that
> you can't estimate the proportion simply by looking at the number of
> significant genes. Secondly it means that the proportion of truly DE
> may actually be of no real biological interest even if you knew it.
> This is because it includes, perhaps even is mostly made up of, genes
> of no practical interest because the fold changes too small to be important.

This is very true...
So, in practical terms, it's probably best to stick to P values when I 
need to make cut offs, and use B if I think a volcano plot ilustrates 
better the point I want to make about the DE genes in a particular 
experiment, but without giving too much importance to the actual value 
(or use the convest function to estimate the proportion of DE genes, 
but bearing in mind that -as you point out above- the true proportion 
is likely to be larger, just beyond our limits of detection)...

does that sound about right?

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK