[BioC] Normalization by DEseq
Wolfgang Huber
whuber at embl.de
Tue Oct 19 15:10:36 CEST 2010
Dear Laurie
Normalisation: Briefly, the normalisation works as follows: if k_ij is
the count of the i-th gene (or in your case, I guess, taxon) in the j-th
sample, then we compute f_i as the geometric mean of these values across
samples. The normalised count is k_ij / f_i.
In more detail, it is described in the paper "Differential expression
analysis for sequence count data", a preprint is available at Nature
Precedings, (4282), 2010, the full publication will come out in Genome
Biology.
Zero counts: The statistical model of DESeq includes situations in which
the counts are zero in one group and non-zero in others, so I would
recommend leaving these taxa in the data, because you will benefit from
getting proper statistical inference for these cases, too.
(Normalisation should, afaIcs, not significantly be affected, unless
there is some really odd asymmetry in your data.)
Best wishes
Wolfgang
Il Oct/19/10 6:56 AM, Rui Luo ha scritto:
> Dear DEseq developers,
> I have a few questions related to the normalization step in DEseq.
> It is stated that it will normalize the raw counts by library size,
> but how the mathmatical idea is? would you mind giving a more detailed
> explanation?
> Now I have two groups of metatranscriptome data, one group contain
> H.pylori, the other not. For sure, I have some transcripts in the first
> group that are from H.pylori but not is in group two.
> I am wondering if I want to do differential expression analysis for
> these two groups, should I filter out the group specific transcripts before
> putting into DEseq? Will this affect the normalization step?
> Thanks!
> best,
> Laurie
>
>
More information about the Bioconductor
mailing list