[BioC] Difference between EdgeR and DeSeq in library normalization

Fri Mar 15 19:58:11 CET 2013

Hi Lucia

On 15/03/13 16:43, Lucia Peixoto wrote:
> I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each.
> I was exploring the performance of both EdgeR and DeSeq and I noticed they
> vary a lot on the dispersion of the normalization factors.
> Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2
> while if I use DeSeq estimateSizeFactors the distribution varies from
> 0.4-1.7. Given that these are exactly the same libraries
> why do the estimates vary so much? How will that impact the list of DEgenes?
> I know that the calculations are not performed in the same way, but aren't
> those two functions aimed at estimating the same phenomenon?

EdgeR's library factors are relative to the total read count, and 
DESeq's aren't. Do, if you want to compare them, you have to multiply 
the factors from edgeR with the total read counts and divide by some 
suitable big number.

So, if sf is vector of size factors from DESeq, nf is a vector of 
normalization factors from edgeR, and rs is the vector with the column 
sums of the count matrix, I would expect that

    plot( sf, rs * nm )

gives a plot with the points lying roughly on a straight line.

   Simon