[BioC] question about normalization of RNAseq by tweeDEseq using TMM from edgeR

Mon Feb 13 18:52:57 CET 2012

Dear Sermsawat,

the way in which "normalizeCounts()" uses edgeR-TMM normalization is
analogous to the edgeR function "exactTest()" which equalizes library
sizes using "equalizeLibSizes()" resulting in these changes in the table
of counts. let me warn you, however, that you should *not* use the
function normalizeCounts() from the tweeDEseq package to input later the
resulting table on some other package for differential expression
analysis, such as edgeR or DESeq. if you're going to use some other
package for DE analysis then you should go to its specific documentation
to see how to input and normalize your data.

cheers,
robert.

On Mon, 2012-02-13 at 00:54 -0500, Sermsawat Tunlaya-Anukit wrote:
> I have some question about normalization in package tweeDEseq which using
> TMM method in edgeR to normalize count data. I run normalization as manual
> and found something unusual. The read count before normalization of gene 4
> sample X1 and X2 is 0, but after normalization it turn to 4 and 3. Why
> normalization add count into 0 count? Did it effect from tagwise
> dispersions? I post my code under here for more information. Thank you in
> advance.
> 
> Sermsawat Tunlaya-anukit
> 
> > library(tweeDEseq)
> > y <- read.table("rawcount.txt", header=T )
> > group <- c(1,1,1,2,2,2,2,3,3,3,4,4)
> > yN <- normalizeCounts(y, group)
> Using edgeR normalization methods.
> Calculating library sizes from column totals.
> Calculating normalization factors with the TMM method.
> Estimating common dispersion.
> Estimating tagwise dispersions.
> Calculating effective library sizes.
> Adjusting counts to effective library sizes using tagwise dispersions.
> > head(y)
>    X1  X2   X3   X4  X5  X6   X7   X8  X9  X10 X11 X12
> 1   0   0    0    1  11  18   16   12   9   12  25  19
> 2  14  28   84   56  54  40  114   86  43   91 150  83
> 3  12   8   18   15  12  10   32   19  27   31  44  21
> 4   0   0    0    0   0   0    0    0   0    0   0   0
> 5   4   6    8    3   7  12   22   44  14    1   1   2
> 6 899 725 1563 1342 173 129 1072 1607 172 1184 720 524
> > head(yN)
>        X1   X2   X3   X4  X5  X6  X7   X8  X9 X10 X11 X12
> [1,]    1    1    0    1  13  22   7    7  13   8  13  17
> [2,]   39   64   81   56  63  51  49   53  65  58  77  76
> [3,]   29   18   17   15  14  13  13   11  39  20  22  19
> [4,]    4    3    0    0   0   1   0    0   1   0   0   0
> [5,]   10   13    8    3   8  15  10   28  21   0   0   2
> [6,] 2306 1652 1497 1342 201 164 468 1001 261 752 363 476
> > sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] C/en_US.UTF-8/C/C/C/C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] tweeDEseq_1.0.11
> 
> loaded via a namespace (and not attached):
> [1] MASS_7.3-16  edgeR_2.4.3  limma_3.10.2 tools_2.14.1
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>