[BioC] [EDGER] Normalization issue

Fri Jul 20 15:57:08 CEST 2012

Dear all,

I am a master student in France, working on RNA-seq data.
I am trying to go through a differential gene expression analysis
using EdgeR and starting with 2 conditions * 2 replicates = 4 runs
(illumina, mapped with bowtie on known reference genome). I have few
questions about the normalization of the dataset.

As I understood, the normalization is needed to correct the library
size between each samples. It is given by the TMM method, calling the
calcNormFactors() function.
This give a normalization factor that will correspond to an offset in
the model that will test for differential expressed genes.

The function estimateCommonDisp() give the dispersion and the
exactTest() run the differential analysis (performing negative
binomial test). But according to the edgeR manual, those two functions
called the equalizeLibSizes() function in order to generate pseudo
counts (which corrected the library size as well).

What I do not understand here is that the library size should be
already corrected by the TMM method.

My question is, finally :
What is the difference between the calcNormFactors() and
equalizeLibSizes()? Does the pseudo-counts generated by
equalizeLibSizes() are taking care of the normalization factor?

I hope I have been clear enough, and that you will be able to help me,

Thanks a lot,

François