[BioC] Difference between EdgeR and DeSeq in library normalization

Ryan C. Thompson rct at thompsonclan.org
Fri Mar 15 23:43:37 CET 2013


If you use the "getOffset" function for your DGEList object and the 
following function for your CountDataSet object, you will get offset 
values that are directly comparable:

library(DESeq)
library(edgeR)
library(ggplot2)
getOffset.CountDataSet <- function(y) {
  if (any(is.na(sizeFactors(y))))
    stop("Call estimateSizeFactors first")
  log(sizeFactors(y)) - mean(log(sizeFactors(y))) + 
mean(log(colSums(counts(y))))
}
cds <- makeExampleCountDataSet()
cds <- estimateSizeFactors(cds)
dge <- DGEList(counts=counts(cds), group=pData(cds)$condition)
dge <- calcNormFactors(dge)

qplot(x=getOffset(dge), y=getOffset.CountDataSet(cds)) +
    labs(title="Offsets, DESeq vs edgeR",
         x="edgeR offset", y="DESeq offset") +
    coord_equal() +
    geom_abline(slope=1, intercept=0)


On Fri 15 Mar 2013 11:58:11 AM PDT, Simon Anders wrote:
> Hi Lucia
>
> On 15/03/13 16:43, Lucia Peixoto wrote:
>> I am currently analyzing an RNASeq dataset, I have 3 samples with n=4
>> each.
>> I was exploring the performance of both EdgeR and DeSeq and I noticed
>> they
>> vary a lot on the dispersion of the normalization factors.
>> Using EdgeR calcNormFactors I get a distribution that varies from
>> 0.9-1.2
>> while if I use DeSeq estimateSizeFactors the distribution varies from
>> 0.4-1.7. Given that these are exactly the same libraries
>> why do the estimates vary so much? How will that impact the list of
>> DEgenes?
>> I know that the calculations are not performed in the same way, but
>> aren't
>> those two functions aimed at estimating the same phenomenon?
>
> EdgeR's library factors are relative to the total read count, and
> DESeq's aren't. Do, if you want to compare them, you have to multiply
> the factors from edgeR with the total read counts and divide by some
> suitable big number.
>
> So, if sf is vector of size factors from DESeq, nf is a vector of
> normalization factors from edgeR, and rs is the vector with the column
> sums of the count matrix, I would expect that
>
>    plot( sf, rs * nm )
>
> gives a plot with the points lying roughly on a straight line.
>
>   Simon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list