[BioC] edgeR and sagenhaft

Mark Robinson mrobinson at wehi.EDU.AU
Thu Feb 19 21:50:00 CET 2009


Hi Naomi.

Thanks for this.

Well, maybe this isn't too surprising.  The sagenhaft p-value does not
even attempt to take into account that there is variability between the
replicates.  edgeR does and I believe this is the source of the
discrepancy.  These genes all have some amount of variability that based
on the dispersion estimates (your phi=2?), is too large for these genes to
be considered significantly different.

A couple other considerations:

1. When you built your DGEList, did you do any row filtering?  I typically
remove any rows in the table that have <k total counts (say k=3) since
those will have very little information.  Usually I keep the library sizes
fixed from the outset.  The reason I ask is maybe there are a lot of these
rows and I don't know how much this influences the dispersion smoothing. 
You could think of this as including a whole bunch of empty spots in a
limma analysis, which you probably wouldn't want to do ...

2. You could also look at the 'pseudo' element of your 'ms' deDGEList
object.  This is the table that the exact test operates on.

We can talk more offline.

HTH.
Mark



> A1                 B1               A2             B2
> 1376421      1577198      1948700      2448499
>
>
>
> At 11:48 PM 2/18/2009, Mark Robinson wrote:
>
>>What are the library sizes?
>>
>>
>>On 19/02/2009, at 3:06 PM, Naomi Altman wrote:
>>
>>>Here is some of the discrepant output.   A1, A2, B1, B2 are the 4
>>>samples with tag counts.
>>>
>>>My commands were
>>>y=cbind(A1,B1,A2,B2)
>>>d<- DGEList(data = y, group = c(1,2,1,2), lib.size = apply(y,2,sum))
>>>alpha <- alpha.approxeb(d)
>>>ms <- deDGE(d, alpha = alpha$alpha)
>>>
>>>Here are the first 5 cases where both samples had Fisher's exact test
>>>p-values <.001 and the edgeR exact statistic >.05.  These
>>>are raw p-values.
>>>
>>>
>>>          A1            B1          A2         B2
>>>Sagenhaft p-value A1 vs B1       Sagenhaft p-value A1 vs B1        ms
>>> $exact
>>>          46           5           39          12
>>>  7.546362e-11                          3.605042e-06
>>>       0.76524756
>>>          33           4           45          13
>>>  1.128468e-07                          3.791248e-07
>>>       0.97325389
>>>           0          55           13          49
>>>  1.070343e-15                          1.668152e-04
>>>       0.08834922
>>>        69         179           92         526
>>>2.088333e-09                          3.816029e-55
>>>     0.56544378
>>>     109          36          122          63
>>>3.988028e-12                          4.556175e-09
>>>     0.86574462
>>>
>>>
>>>
>>>Naomi S. Altman                                814-865-3791 (voice)
>>>Associate Professor
>>>Dept. of Statistics                              814-863-7114 (fax)
>>>Penn State University                         814-865-1348
>>>(Statistics)
>>>University Park, PA 16802-2111
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>------------------------------
>>Mark Robinson
>>Epigenetics Laboratory, Garvan
>>Bioinformatics Division, WEHI
>>e: m.robinson at garvan.org.au
>>e: mrobinson at wehi.edu.au
>>p: +61 (0)3 9345 2628
>>f: +61 (0)3 9347 0852
>>------------------------------
>>
>>
>>
>>
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
>



More information about the Bioconductor mailing list