[BioC] TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq
Smith, Hilary A
hilary.smith at gatech.edu
Sat Nov 19 14:08:16 CET 2011
Thank you very much, and thank you for posting the code to allow baySeq to use the calcNormFactors/TMM normalization.
Best,
Hilary
----- Original Message -----
From: "Gordon K Smyth" <smyth at wehi.EDU.AU>
To: "Hilary A Smith" <hilary.smith at gatech.edu>
Cc: "Bioconductor mailing list" <bioconductor at r-project.org>, "Thomas J Hardcastle" <tjh48 at cam.ac.uk>
Sent: Friday, November 18, 2011 10:10:23 PM
Subject: TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq
Dear Hilary and Thomas,
The calcNormFactors() argument formerly called "quantile" was renamed to
"p" in the edgeR package in Bioc-devel on 10 July, because the quantity is
a probability and not a quantile.
At the same time, the option method="quantile" was renamed to
method="upperquartile", to better match the original terminology for
Bullard et al (2010) paper and to distinguish it from full quantile
normalization now being proposed by a number of authors.
Best wishes
Gordon
> Date: Thu, 17 Nov 2011 10:07:31 -0500 (EST)
> From: "Smith, Hilary A" <hilary.smith at gatech.edu>
> To: bioconductor at r-project.org
> Subject: [BioC] TMM and calcNormFactors: Normalization in baySeq to
> match edgeR and DESeq
>
> Hello,
> I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using
> baySeq, edgeR, and DESeq are yielding different answers; specifically
> DESeq and baySeq find different subsets of the genes found by edgeR. In
> trying to isolate the discrepancy, I've been trying to make items like
> normalization procedures similar to see if that improves congruency, or
> if the differences merely stem from how the pairwise tests are run and
> use of bayesian vs. exact-type statistics. I saw that baySeq's function
> "getLibsizes" can use the edgeR implementation of TMM, but when I try to
> do this I get an error message about a quantile argument not being used.
> This error appears whether or not I specify a quantile, and I'm further
> confused because the edgeR program itself does not require specifying
> quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so
> I think the problem is in the implementation of baySeq; perhaps I'm
> misunderstanding/coding something? Any help is greatly appreciated;
> commands excerpted from an R session are below.
>
>
>> library(baySeq)
>
> Attaching package: 'baySeq'
>
> The following object(s) are masked from 'package:base':
>
> rbind
>
>> library(snow)
>> cl = makeCluster(4, "SOCK")
>> library(edgeR)
>> simData = read.delim(file="2011.11.03counts.txt", header=TRUE)
>> rownames(simData)=simData$CompID
>> simData=simData[,-1]
>> simData=as.matrix(simData)
>> head(simData)
> X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F
> comp0 1065 1159 1207 1572 1477 1817 1841 605 1915 1113 1645
> comp1 544 534 341 675 333 739 690 236 502 451 571
> comp10 30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575
> comp100 1060 1065 999 1332 918 1620 1697 658 1117 861 1336
> comp1000 130 157 229 266 141 247 263 135 182 188 168
> comp10000 35 14 15 37 10 47 28 17 22 21 12
> X3P_R
> comp0 1732
> comp1 799
> comp10 51243
> comp100 1370
> comp1000 244
> comp10000 64
>> replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R")
>> groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2))
>> cD = new("countData", data = simData, replicates = replicates, groups=groups)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM")
> Error in match.arg(estimationType) :
> 'arg' should be one of "quantile", "total", "edgeR"
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75)
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75))
> Error in match.arg(estimationType) : 'arg' must be of length 1
>> calcNormFactors(cD)
> Error in calcNormFactors(cD) :
> calcNormFactors() only operates on 'matrix' and 'DGEList' objects
>> calcNormFactors(simData)
> X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R
> 1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707
> X2P_F X2P_R X3P_F X3P_R
> 1.0550414 0.8955669 1.0869486 1.1052472
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list