[BioC] Fwd: Re: edgeR and sagenhaft

Naomi Altman naomi at stat.psu.edu
Sun Feb 15 17:25:11 CET 2009


>To: "Mark Robinson" <mrobinson at wehi.EDU.AU>
>From: Naomi Altman <naomi at stat.psu.edu>
>Subject: Re: [BioC] edgeR and sagenhaft
>Cc:
>Bcc:
>X-Eudora-Signature: <work>
>Date: Sun, 15 Feb 2009 11:21:54 -0500
>
>Dear Mark,
>Thanks for your feedback.
>
>Here are some comments to the comments:
>
>These are biological replicates.  There is also a technical 
>replicate that was sequenced using another method, but I did not use 
>it for this analysis.
>(It also has some interesting behavior.)
>
>For lib.size I use the total number of reads.  This is RNA-seq data, 
>so the number is large.  The lib sizes do vary about 20%.
>I did the batch tests pairwise exactly as if they were different genotypes.
>
>edgeR reported about 4x as many differentially expressed genes as 
>sage.test.  But there was almost no overlap with any of the genes 
>selected as significant by sage.test.
>
>I am going to resort to my usual advice and look at some of the 
>counts for genes tagged by each method as differentially 
>expressed.  I will report back if I solve the mystery.
>
>Thanks,
>Naomi
>
>At 07:35 PM 2/13/2009, you wrote:
>>Hi Naomi.
>>
>>Curious.  A bit difficult to diagnose without digging into it.  There is
>>probably a reasonable explanation for all of this.
>>
>>For what its worth, a few comments/queries below.
>>
>>
>> > I have 4 large tag datasets  A1, A2 and B1, B2.  The purpose of the
>> > experiment was to determine differences in gene expression between A and
>> > B.
>> > A1 and B1 were done together as batch 1, and  A2 and B2 were done
>> > together as batch 2.
>>
>>First question: are these technical replicates or biological?  If
>>technical, you may consider the 'doPoisson=TRUE' option of deDGE() since
>>that effectively sets r large (dispersion small), making it a Poisson
>>calculation.
>>
>>
>> > I several analyses and am completely puzzled.
>> >
>> > First I ran sage.test (Fisher's exact test) on A1, B1 and on A2,
>> > B2.  The results were strongly concordant in that there was a lot of
>> > overlap in the significant gene list,
>> > and the same genes were up/down regulated (on the whole).
>> >
>> > Then I ran edgeR on all 4 samples.  A large number of genes were
>> > declared significantly differentially expressed, but it was almost
>> > completely disjoint from the genes "found" by sage.test. (Fewer than
>> > 10 out of 4000).  The $r$ values were strongly clustered around 2,
>> > although some were huge.  Incidentally, the "exact" component of the
>> > output does not seem to be described in ?edgeR, but I understand it
>> > to be the p-value from the test.
>>
>>'r' values around 2 suggest there is significant variation over and above
>>Poisson.  But, maybe this is due to batch effects.
>>
>>Indeed, the 'exact' element is the p-value from the exact test proposed in
>>the paper.
>>
>>What do you use for 'lib.size' -- total number of reads?  Are they
>>drastically different from batch-to-batch/sample-to-sample?  How do the
>>batch effects manifest -- more total reads giving higher overall counts,
>>or something different?
>>
>>
>> > Then I tested for batch effects by using sage.test on A1, A2 and  on
>> > B1, B2 and finally on A1 U B1 and A2 U B2.  A fairly large number of
>> > genes showed strong batch effects.  These overlapped more with the
>> > genotype within batch sage.test results than with the edgeR results.
>>
>>
>>Strong batch effects that aren't explained by total counts would result in
>>higher dispersion estimates (lower values of 'r') in edgeR, thus giving
>>fewer DE genes.  So, maybe this explains some of the lower overlap here.
>>
>>
>> > Just to make things more confusing, the grad student who ran the
>> > samples used the normal approximation to the Poisson to test genotype
>> > effects within batch.  These
>> > were highly concordant between batches as well, but did not match the
>> > sage.test results.  I thought the p-values would be similar at least
>> > for genes with large counts, but they were not.
>> >
>> > I am inclined to go with combining the sage.test results, but any
>> > advice would be very welcome
>>
>>
>>Not sure I've really contributed much, but there must be a reasonable
>>explanation.
>>
>>Mark
>>
>>
>>
>>
>> >
>> > Thanks,
>> >
>> > Naomi S. Altman                                814-865-3791 (voice)
>> > Associate Professor
>> > Dept. of Statistics                              814-863-7114 (fax)
>> > Penn State University                         814-865-1348 (Statistics)
>> > University Park, PA 16802-2111
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348 (Statistics)
>University Park, PA 16802-2111

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list