[BioC] edgeR and sagenhaft

Fri Feb 13 15:31:46 CET 2009

I have 4 large tag datasets  A1, A2 and B1, B2.  The purpose of the 
experiment was to determine differences in gene expression between A and B.
A1 and B1 were done together as batch 1, and  A2 and B2 were done 
together as batch 2.

I several analyses and am completely puzzled.

First I ran sage.test (Fisher's exact test) on A1, B1 and on A2, 
B2.  The results were strongly concordant in that there was a lot of 
overlap in the significant gene list,
and the same genes were up/down regulated (on the whole).

Then I ran edgeR on all 4 samples.  A large number of genes were 
declared significantly differentially expressed, but it was almost 
completely disjoint from the genes "found" by sage.test. (Fewer than 
10 out of 4000).  The $r$ values were strongly clustered around 2, 
although some were huge.  Incidentally, the "exact" component of the 
output does not seem to be described in ?edgeR, but I understand it 
to be the p-value from the test.

Then I tested for batch effects by using sage.test on A1, A2 and  on 
B1, B2 and finally on A1 U B1 and A2 U B2.  A fairly large number of 
genes showed strong batch effects.  These overlapped more with the 
genotype within batch sage.test results than with the edgeR results.

Just to make things more confusing, the grad student who ran the 
samples used the normal approximation to the Poisson to test genotype 
effects within batch.  These
were highly concordant between batches as well, but did not match the 
sage.test results.  I thought the p-values would be similar at least 
for genes with large counts, but they were not.

I am inclined to go with combining the sage.test results, but any 
advice would be very welcome

Thanks,

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111