[BioC] Problem with p-values calculated by eBayes--corrected format

Fri Jan 9 18:35:16 CET 2009

On Jan 9, 2009, at 9:21 , Chen, Zhuoxun wrote:

> Hi Bioconductors,
>
> I am really sorry about sending this email again. I didn't know that  
> the table on my email will be lost and reformat. I corrected the  
> format now. Thank you for your patience.
>
> I have a very weird problem with the statistics with my microarray  
> data. I would like to ask for your help.
> I am running a microarray with 16 groups, 3 samples/group. On my  
> genechip, every probe is spotted 2 times.
> By comparing two groups (let’s say A and B), I came across a gene  
> that is very significant by running the following codes, with a p- 
> value= 0.001669417
> ------------------------------------------------------------------------------------------------------------
> corfit <- duplicateCorrelation(Gvsn, design = design, ndups = 2,  
> spacing = 1)
> fit <-  lmFit(Gvsn, design = design, ndups = 2, spacing = 1,  
> correlation = corfit$consensus)
> contrast.matrix <- makeContrasts(A-B, levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit3 <- eBayes(fit2)
> ------------------------------------------------------------------------------------------------------------
> Then, I looked at the raw data; copy and paste onto Excel and did a  
> simple t-test
>
> 	A	      B
> 1	6.938162	7.093199
> 2	7.012382	8.05612
> 3	7.000305	6.99907

This is 1 contrast with 3 samples in each group. But where is the data  
from the second probe? And what is the values of corfit?

>
> Avg	6.983616	7.382799
> contrast	0.399182	
>
> p-value	 	
> one tailed, unequal variance, t-test=0.179333	
> one tailed, equal variance, t-test=0.151844	
>
> The p-value is NOT even close to 0.05. Then I looked at the contrast  
> of fit3$coefficient, it is 0.399182, which indicates the data input  
> for the codes are correct.
>
> I don’t understand why it has such a huge difference on p-value  
> between those two methods. Could somebody please help me with it?

You are both allowing for correlation (which may or may not be  
sensible, that is hard to know unless you post more details) and you  
do an empirical Bayes correction. So you are pretty far from doing a  
standard t-test, and I see no big problem in method "A" giving a  
different answer from method "B" when the two methods are somewhat  
different.. Explaining in details what the difference is, is way  
beyond the scope of an email. A super short answer is that you combine  
information from having multiple spots measuing the same transcript  
and that you borrow information about the gene-level variance from  
looking at the behaviour of all genes. If you want more details,  I  
suggest you read up on mixed models as well as empirical bayes  
correction. A good starting point will Gordon's sagmb article, cited  
in limma.

Kasper