[BioC] DEseq for chip-seq data normalisation

Rory Stark Rory.Stark at cruk.cam.ac.uk
Wed Nov 6 20:16:16 CET 2013

Hi Ying-

We actually just changed the default normalization from effective to full
library size in the most recent release. The reason is that while
effective is frequently a better choice, it is based on the assumption
that overall binding levels in all the samples is similar. When this
assumption is incorrect, it can result in substantially incorrect results;
using full library size when effective applies results in less
catastrophically wrong answers.

I will definitely be changing normalization to use effective sizes when it
is the right thing to do, but I have become aware that many (most?)
DiffBind users don't change the defaults, so we determined that a more
conservative default was preferable.

I'm not sure what you're asking regarding "try and minimize changes
between conditions" in this context?


On 06/11/2013 19:08, "Ying Wu" <daiyingw at gmail.com> wrote:

>Hi Rory,
>Could you give some insight into why TMM is used with full library size,
>it seems to make sense for effective library size case but where full
>library size is used, would it still be valid to try and minimize
>changes between conditions?
>On 11/05/13 18:18, Rory Stark wrote:
> > Hi Guiseppe-
> >
> > You can retrieve the complete matrix of read counts from DiffBind,
> > normalized or not, using dba.peakset with bRetrieve=TRUE. To can set
> > score to use via dba.count with peaks=NULL and score=DBA_SCORE_READS,
> > any of the other possible score values. The default score is
> > DBA_SCORE_TMM_MINUS_FULL, which is normalized using edgeR's TMM method,
> > after subtracting the reads in the control, and using the full library
> > size (not just the reads in peaks) as a scalar.
> >
> > Cheers-
> > Rory
> >

More information about the Bioconductor mailing list