[BioC] Problem with p-values calculated by eBayes--corrected format
c.mayer at abdn.ac.uk
Mon Jan 12 14:54:04 CET 2009
Hi Zhuoxun Chen!
As Kasper indicates below there might be a combination of reasons for the difference you observe but one of them is in fact quite easy to explain. One of the main differences between the limma version of a t-test with the standard t-test is the standard of difference (SED), which one uses as the denominator in the the t-statistic. Limma shrinks that SED towards the average SED across all genes, i.e. for genes with high variances the Limma SED will be smaller than the one used by the traditional t-test and the t-statistic will thus be bigger when using limma. It seems that the one big value in Group B results in a high SED when using the standard t-test (so gives a no-significant result), but limma shrinks it to a smaller number which makes the result more significant.
As several studies have shown this is a good strategy in general, but obviously there will be cases where a standard t-test might result in a better decision. If you go through the list of all genes you probably will also find examples where the traditional t-test gives you a spurious significant result but limma doesn't (as Kasper already wrote: different methods will give different results).
As said before the shrinkage of variances/SEDs might not be the only reason for the observed difference but I assume it is a contributing factor.
> > I don't understand why it has such a huge difference on p-value
> > between those two methods. Could somebody please help me with it?
> You are both allowing for correlation (which may or may not be
> sensible, that is hard to know unless you post more details) and you
> do an empirical Bayes correction. So you are pretty far from doing a
> standard t-test, and I see no big problem in method "A" giving a
> different answer from method "B" when the two methods are somewhat
> different.. Explaining in details what the difference is, is way
> beyond the scope of an email. A super short answer is that you combine
> information from having multiple spots measuing the same transcript
> and that you borrow information about the gene-level variance from
> looking at the behaviour of all genes. If you want more details, I
> suggest you read up on mixed models as well as empirical bayes
> correction. A good starting point will Gordon's sagmb article, cited
> in limma.
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
The University of Aberdeen is a charity registered in Scotland, No SC013683.
More information about the Bioconductor