[BioC] Limma: correct calculation of B statistics (log odds)
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Fri Apr 21 16:23:19 CEST 2006
Dear Gordon,
many thanks for (as usual) a very helpful and informative answer.
> 2. The B-statistic is the log-posterior odds of differential
> expression. Naturally the posterior probability has to depend on the
> prior probability. However the ranking of the genes given by the
> B-statistic doesn't depend on the prior probability.
But you can't use the B statistic to decide a suitable cut off for your
experiment unless the proportion of DE genes has been estimated. Since
the adjusted P Values rank the genes in the same order as B... why use
B at all?
Volcano plots (M vs B) are useful no matter how B is estimated, but
they would be more meaningful if we could say something like "for B>3
we can be pretty confident these genes are truly DE"
> 5. If you have a prior belief based on sound *biological* grounds
> that the overall proportion of DE genes should be substantially
> greater or less than 0.01, then you are permitted and encouraged to
> set the proportion for yourself.
I don't, really. I may believe that more genes are DE in some types of
experiments than others, but I can't tell what the proportion should
be, based on prior knowledge of the biology. Sometimes I expect "many"
changes, and others "not so many", and the observations fit... but how
many is many? I don't know. That's why I thought about maybe using FDR,
but it seems that the issue is more complex than that (ah, it always
is...)
> 6. If you really can't resist estimating the proportion of DE genes,
> the best method that I know of it that proposed by Ferkinstad et al
> (2005). This is implemented in limma in the convest() function. In
> limma you can use
>
> p0 <- convest(fit$p.value[,"coefofinterest"])
>
> to estime the proportion of non-DE genes for your contrast of interest.
>
> Ferkingstad, E., Langaas, M., and Lindqvist, B. (2005). Estimating
> the proportion of true null hypotheses, with application
> to DNA microarray data. Journal of the Royal Statistical Society
> Series B, 67, 555-572.
> http://www.math.ntnu.no/~mettela/SFG/research.imf
thank you very much for that, Gordon.
> 7. Finally, note that the number of truly DE genes is likely to be
> quite a bit greater than the number of statistically significant DE
> genes. This is because there may be many genes which are DE but with
> such small fold changes that you have no realistic chance of
> detecting them. This fact has two consequences. Firstly it means that
> you can't estimate the proportion simply by looking at the number of
> significant genes. Secondly it means that the proportion of truly DE
> may actually be of no real biological interest even if you knew it.
> This is because it includes, perhaps even is mostly made up of, genes
> of no practical interest because the fold changes too small to be important.
This is very true...
So, in practical terms, it's probably best to stick to P values when I
need to make cut offs, and use B if I think a volcano plot ilustrates
better the point I want to make about the DE genes in a particular
experiment, but without giving too much importance to the actual value
(or use the convest function to estimate the proportion of DE genes,
but bearing in mind that -as you point out above- the true proportion
is likely to be larger, just beyond our limits of detection)...
does that sound about right?
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
More information about the Bioconductor
mailing list