[BioC] TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq

Wolfgang Huber whuber at embl.de
Thu Nov 17 16:21:21 CET 2011


Dear Hilary

not a reply to your question on normalisation function parameters, but 
to the underlying one about method comparison. You could try these things:

- What happens if you simply compare biological replicates ? In such 
cases you want that the fraction of genes gettting an (unadjusted) 
p-value of, say, 5%, is less or equal to 5%.

- Try a pairs() plot of the logarithms of the p-value values from each 
method, to see whether it is simply a cutoff effect.

- Plot the detection call as a function of average count or of (within 
group) standard deviation, to see whether any biases are correlated with 
these.

	Best wishes
	Wolfgang



On 11/17/11 4:07 PM, Smith, Hilary A wrote:
> Hello,
> I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using baySeq, edgeR, and DESeq are yielding different answers; specifically DESeq and baySeq find different subsets of the genes found by edgeR. In trying to isolate the discrepancy, I've been trying to make items like normalization procedures similar to see if that improves congruency, or if the differences merely stem from how the pairwise tests are run and use of bayesian vs. exact-type statistics. I saw that baySeq's function "getLibsizes" can use the edgeR implementation of TMM, but when I try to do this I get an error message about a quantile argument not being used. This error appears whether or not I specify a quantile, and I'm further confused because the edgeR program itself does not require specifying quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so I think the problem is in the implementation of baySeq; perhaps I'm misunderstanding/coding something? Any help is greatly appre
c!
>   iated; commands excerpted from an R session are below.
>
>
>> library(baySeq)
>
> Attaching package: 'baySeq'
>
> The following object(s) are masked from 'package:base':
>
>      rbind
>
>> library(snow)
>> cl = makeCluster(4, "SOCK")
>> library(edgeR)
>> simData = read.delim(file="2011.11.03counts.txt", header=TRUE)
>> rownames(simData)=simData$CompID
>> simData=simData[,-1]
>> simData=as.matrix(simData)
>> head(simData)
>            X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F
> comp0      1065  1159  1207  1572  1477  1817  1841   605  1915  1113  1645
> comp1       544   534   341   675   333   739   690   236   502   451   571
> comp10    30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575
> comp100    1060  1065   999  1332   918  1620  1697   658  1117   861  1336
> comp1000    130   157   229   266   141   247   263   135   182   188   168
> comp10000    35    14    15    37    10    47    28    17    22    21    12
>            X3P_R
> comp0      1732
> comp1       799
> comp10    51243
> comp100    1370
> comp1000    244
> comp10000    64
>> replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R")
>> groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2))
>> cD = new("countData", data = simData, replicates = replicates, groups=groups)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>    unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM")
> Error in match.arg(estimationType) :
>    'arg' should be one of "quantile", "total", "edgeR"
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75)
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>    unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>    unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75))
> Error in match.arg(estimationType) : 'arg' must be of length 1
>> calcNormFactors(cD)
> Error in calcNormFactors(cD) :
>    calcNormFactors() only operates on 'matrix' and 'DGEList' objects
>> calcNormFactors(simData)
>      X1E_F     X1E_R     X2E_F     X2E_R     X3E_F     X3E_R     X1P_F     X1P_R
> 1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707
>      X2P_F     X2P_R     X3P_F     X3P_R
> 1.0550414 0.8955669 1.0869486 1.1052472
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>    unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
>    unused argument(s) (quantile = quantile)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list