[BioC] TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq

Smith, Hilary A hilary.smith at gatech.edu
Sat Nov 19 14:08:16 CET 2011


Thank you very much, and thank you for posting the code to allow baySeq to use the calcNormFactors/TMM normalization.
Best,
Hilary

----- Original Message -----
From: "Gordon K Smyth" <smyth at wehi.EDU.AU>
To: "Hilary A Smith" <hilary.smith at gatech.edu>
Cc: "Bioconductor mailing list" <bioconductor at r-project.org>, "Thomas J Hardcastle" <tjh48 at cam.ac.uk>
Sent: Friday, November 18, 2011 10:10:23 PM
Subject: TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq

Dear Hilary and Thomas,

The calcNormFactors() argument formerly called "quantile" was renamed to 
"p" in the edgeR package in Bioc-devel on 10 July, because the quantity is 
a probability and not a quantile.

At the same time, the option method="quantile" was renamed to 
method="upperquartile", to better match the original terminology for 
Bullard et al (2010) paper and to distinguish it from full quantile 
normalization now being proposed by a number of authors.

Best wishes
Gordon

> Date: Thu, 17 Nov 2011 10:07:31 -0500 (EST)
> From: "Smith, Hilary A" <hilary.smith at gatech.edu>
> To: bioconductor at r-project.org
> Subject: [BioC] TMM and calcNormFactors: Normalization in baySeq to
> 	match	edgeR and DESeq
>
> Hello,

> I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using 
> baySeq, edgeR, and DESeq are yielding different answers; specifically 
> DESeq and baySeq find different subsets of the genes found by edgeR. In 
> trying to isolate the discrepancy, I've been trying to make items like 
> normalization procedures similar to see if that improves congruency, or 
> if the differences merely stem from how the pairwise tests are run and 
> use of bayesian vs. exact-type statistics. I saw that baySeq's function 
> "getLibsizes" can use the edgeR implementation of TMM, but when I try to 
> do this I get an error message about a quantile argument not being used. 
> This error appears whether or not I specify a quantile, and I'm further 
> confused because the edgeR program itself does not require specifying 
> quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so 
> I think the problem is in the implementation of baySeq; perhaps I'm 
> misunderstanding/coding something? Any help is greatly appreciated; 
> commands excerpted from an R session are below.
>
>
>> library(baySeq)
>
> Attaching package: 'baySeq'
>
> The following object(s) are masked from 'package:base':
>
>    rbind
>
>> library(snow)
>> cl = makeCluster(4, "SOCK")
>> library(edgeR)
>> simData = read.delim(file="2011.11.03counts.txt", header=TRUE)
>> rownames(simData)=simData$CompID
>> simData=simData[,-1]
>> simData=as.matrix(simData)
>> head(simData)
>          X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F
> comp0      1065  1159  1207  1572  1477  1817  1841   605  1915  1113  1645
> comp1       544   534   341   675   333   739   690   236   502   451   571
> comp10    30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575
> comp100    1060  1065   999  1332   918  1620  1697   658  1117   861  1336
> comp1000    130   157   229   266   141   247   263   135   182   188   168
> comp10000    35    14    15    37    10    47    28    17    22    21    12
>          X3P_R
> comp0      1732
> comp1       799
> comp10    51243
> comp100    1370
> comp1000    244
> comp10000    64
>> replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R")
>> groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2))
>> cD = new("countData", data = simData, replicates = replicates, groups=groups)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>  unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM")
> Error in match.arg(estimationType) :
>  'arg' should be one of "quantile", "total", "edgeR"
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75)
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>  unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>  unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75))
> Error in match.arg(estimationType) : 'arg' must be of length 1
>> calcNormFactors(cD)
> Error in calcNormFactors(cD) :
>  calcNormFactors() only operates on 'matrix' and 'DGEList' objects
>> calcNormFactors(simData)
>    X1E_F     X1E_R     X2E_F     X2E_R     X3E_F     X3E_R     X1P_F     X1P_R
> 1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707
>    X2P_F     X2P_R     X3P_F     X3P_R
> 1.0550414 0.8955669 1.0869486 1.1052472
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>  unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>  unused argument(s) (quantile = quantile)

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list