[BioC] Problem with p-values calculated by eBayes--corrected format
Kasper Daniel Hansen
khansen at stat.berkeley.edu
Fri Jan 9 18:35:16 CET 2009
On Jan 9, 2009, at 9:21 , Chen, Zhuoxun wrote:
> Hi Bioconductors,
>
> I am really sorry about sending this email again. I didn't know that
> the table on my email will be lost and reformat. I corrected the
> format now. Thank you for your patience.
>
> I have a very weird problem with the statistics with my microarray
> data. I would like to ask for your help.
> I am running a microarray with 16 groups, 3 samples/group. On my
> genechip, every probe is spotted 2 times.
> By comparing two groups (let’s say A and B), I came across a gene
> that is very significant by running the following codes, with a p-
> value= 0.001669417
> ------------------------------------------------------------------------------------------------------------
> corfit <- duplicateCorrelation(Gvsn, design = design, ndups = 2,
> spacing = 1)
> fit <- lmFit(Gvsn, design = design, ndups = 2, spacing = 1,
> correlation = corfit$consensus)
> contrast.matrix <- makeContrasts(A-B, levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit3 <- eBayes(fit2)
> ------------------------------------------------------------------------------------------------------------
> Then, I looked at the raw data; copy and paste onto Excel and did a
> simple t-test
>
> A B
> 1 6.938162 7.093199
> 2 7.012382 8.05612
> 3 7.000305 6.99907
This is 1 contrast with 3 samples in each group. But where is the data
from the second probe? And what is the values of corfit?
>
> Avg 6.983616 7.382799
> contrast 0.399182
>
> p-value
> one tailed, unequal variance, t-test=0.179333
> one tailed, equal variance, t-test=0.151844
>
> The p-value is NOT even close to 0.05. Then I looked at the contrast
> of fit3$coefficient, it is 0.399182, which indicates the data input
> for the codes are correct.
>
> I don’t understand why it has such a huge difference on p-value
> between those two methods. Could somebody please help me with it?
You are both allowing for correlation (which may or may not be
sensible, that is hard to know unless you post more details) and you
do an empirical Bayes correction. So you are pretty far from doing a
standard t-test, and I see no big problem in method "A" giving a
different answer from method "B" when the two methods are somewhat
different.. Explaining in details what the difference is, is way
beyond the scope of an email. A super short answer is that you combine
information from having multiple spots measuing the same transcript
and that you borrow information about the gene-level variance from
looking at the behaviour of all genes. If you want more details, I
suggest you read up on mixed models as well as empirical bayes
correction. A good starting point will Gordon's sagmb article, cited
in limma.
Kasper
More information about the Bioconductor
mailing list