[BioC] DEseq for chip-seq data normalisation

Ying Wu daiyingw at gmail.com
Wed Nov 6 20:52:03 CET 2013

Hi Rory,
 From my understanding, TMM normalization will try to minimize the fold 
changes between conditions (assumes that most regions are not 
differentially expressed) and works best with about equal number of up 
and down regulated genes (For the second part see some of the work done 
by Kadota K. on extending TMM http://pubmed.gov/22475125)

Using full library size, one no longer assumes that overall binding 
levels in all samples are similar. If the motivation behind using full 
library size is that overall binding levels are very different, wouldn't 
most regions then be differentially bound and thus TMM's assumption that 
most regions are unchanged also be invalid?


On 11/6/2013 11:16 AM, Rory Stark wrote:
> Hi Ying-
> We actually just changed the default normalization from effective to full
> library size in the most recent release. The reason is that while
> effective is frequently a better choice, it is based on the assumption
> that overall binding levels in all the samples is similar. When this
> assumption is incorrect, it can result in substantially incorrect results;
> using full library size when effective applies results in less
> catastrophically wrong answers.
> I will definitely be changing normalization to use effective sizes when it
> is the right thing to do, but I have become aware that many (most?)
> DiffBind users don't change the defaults, so we determined that a more
> conservative default was preferable.
> I'm not sure what you're asking regarding "try and minimize changes
> between conditions" in this context?
> Cheers-
> Rory
> On 06/11/2013 19:08, "Ying Wu" <daiyingw at gmail.com> wrote:
>> Hi Rory,
>> Could you give some insight into why TMM is used with full library size,
>> it seems to make sense for effective library size case but where full
>> library size is used, would it still be valid to try and minimize
>> changes between conditions?
>> Best,
>> -Ying
>> On 11/05/13 18:18, Rory Stark wrote:
>>> Hi Guiseppe-
>>> You can retrieve the complete matrix of read counts from DiffBind,
>> either
>>> normalized or not, using dba.peakset with bRetrieve=TRUE. To can set
>> the
>>> score to use via dba.count with peaks=NULL and score=DBA_SCORE_READS,
>> or
>>> any of the other possible score values. The default score is
>>> DBA_SCORE_TMM_MINUS_FULL, which is normalized using edgeR's TMM method,
>>> after subtracting the reads in the control, and using the full library
>>> size (not just the reads in peaks) as a scalar.
>>> Cheers-
>>> Rory

More information about the Bioconductor mailing list