[BioC] DESeq normalisation strategy
Davide Cittaro
cittaro.davide at hsr.it
Thu May 30 08:08:22 CEST 2013
Hi Simon,
On May 29, 2013, at 11:46 AM, Simon Anders <anders at embl.de> wrote:
> Hi Davide
>
> On 29/05/13 10:58, Davide Cittaro wrote:
>> I've been reading about DESeq normalization strategy and, as far as I understand, it works on a sample basis: counts for each samples are normalized according to a factor calculated using the geometric mean of the counts.
>> Three questions:
>> - is this strategy robust when comparing samples with extremely different library sizes?
>
> Sure, why shouldn't it be?
>
You know, just a check :-)
In a small dataset I've artificially reduced the counts for a sample by different factors and checked the ratios between the counts of that sample and an invariant one. Indeed there are different but the rms is really small.
>
> The notion of "calculating cpm on normalized counts" is hence a
> contradiction in terms.
I somehow agree with you, I'm a bit puzzled about the fact I've seen this in other packages (such as edgeR, but that may be another story).
>
>> - counts are calculated on genomic intervals, would the same approach make sense if I use counts on single nucleotides?
>
> In principle, yes. The problem is that once your feature are very small,
> very many of the counts may be zero, and the geometric mean of any set
> of numbers containing at least one zero is zero. Hence, you can only use
> feature with sufficiently high counts to get a stable estimate, and you
> may not have enough of these.
Well, that happens also with intervals, especially if you deal with some kind of ChIP-seq experiments. The way you use to calculate factors goes through log(counts), and you exclude intervals with at least one zero count. I tried to get the size factors sampling my dataset and using 1/10 of it and the factor estimates are quite robust.
My problem, if that was not clear, is that I would like to have a normalization strategy for signals across the genome. Typically these are at small-interval level (less than 200 bp)
Thanks
d
More information about the Bioconductor
mailing list