[BioC] Problem with p-values calculated by eBayes--corrected format

Mon Jan 12 14:54:04 CET 2009

Hi Zhuoxun Chen!

As Kasper indicates below there might be a combination of reasons for the difference you observe but one of them is in fact quite easy to explain. One of the main differences between the limma version of a t-test with the standard t-test is the standard of difference (SED), which one uses as the denominator in the the t-statistic. Limma shrinks that SED towards the average SED across all genes, i.e. for genes with high variances the Limma SED will be smaller than the one used by the traditional t-test and the t-statistic will thus be bigger when using limma. It seems that the one big value in Group B results in a high SED when using the standard t-test (so gives a no-significant result), but limma shrinks it to a smaller number which makes the result more significant.

As several studies have shown this is a good strategy in general, but obviously there will be cases where a standard t-test might result in a better decision. If you go through the list of all genes you probably will also find examples where the traditional t-test gives you a spurious significant result but limma doesn't (as Kasper already wrote: different methods will give different results).

As said before the shrinkage of variances/SEDs might not be the only reason for the observed difference but I assume it is a contributing factor.

Best Wishes

Claus

> > I don't understand why it has such a huge difference on p-value
> > between those two methods. Could somebody please help me with it?
>
> You are both allowing for correlation (which may or may not be
> sensible, that is hard to know unless you post more details) and you
> do an empirical Bayes correction. So you are pretty far from doing a
> standard t-test, and I see no big problem in method "A" giving a
> different answer from method "B" when the two methods are somewhat
> different.. Explaining in details what the difference is, is way
> beyond the scope of an email. A super short answer is that you combine
> information from having multiple spots measuing the same transcript
> and that you borrow information about the gene-level variance from
> looking at the behaviour of all genes. If you want more details,  I
> suggest you read up on mixed models as well as empirical bayes
> correction. A good starting point will Gordon's sagmb article, cited
> in limma.
>
> Kasper
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

The University of Aberdeen is a charity registered in Scotland, No SC013683.