[BioC] TMM and calcNormFactors: Normalization in baySeq to match edgeR and DESeq
Wolfgang Huber
whuber at embl.de
Thu Nov 17 16:21:21 CET 2011
Dear Hilary
not a reply to your question on normalisation function parameters, but
to the underlying one about method comparison. You could try these things:
- What happens if you simply compare biological replicates ? In such
cases you want that the fraction of genes gettting an (unadjusted)
p-value of, say, 5%, is less or equal to 5%.
- Try a pairs() plot of the logarithms of the p-value values from each
method, to see whether it is simply a cutoff effect.
- Plot the detection call as a function of average count or of (within
group) standard deviation, to see whether any biases are correlated with
these.
Best wishes
Wolfgang
On 11/17/11 4:07 PM, Smith, Hilary A wrote:
> Hello,
> I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using baySeq, edgeR, and DESeq are yielding different answers; specifically DESeq and baySeq find different subsets of the genes found by edgeR. In trying to isolate the discrepancy, I've been trying to make items like normalization procedures similar to see if that improves congruency, or if the differences merely stem from how the pairwise tests are run and use of bayesian vs. exact-type statistics. I saw that baySeq's function "getLibsizes" can use the edgeR implementation of TMM, but when I try to do this I get an error message about a quantile argument not being used. This error appears whether or not I specify a quantile, and I'm further confused because the edgeR program itself does not require specifying quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so I think the problem is in the implementation of baySeq; perhaps I'm misunderstanding/coding something? Any help is greatly appre
c!
> iated; commands excerpted from an R session are below.
>
>
>> library(baySeq)
>
> Attaching package: 'baySeq'
>
> The following object(s) are masked from 'package:base':
>
> rbind
>
>> library(snow)
>> cl = makeCluster(4, "SOCK")
>> library(edgeR)
>> simData = read.delim(file="2011.11.03counts.txt", header=TRUE)
>> rownames(simData)=simData$CompID
>> simData=simData[,-1]
>> simData=as.matrix(simData)
>> head(simData)
> X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F
> comp0 1065 1159 1207 1572 1477 1817 1841 605 1915 1113 1645
> comp1 544 534 341 675 333 739 690 236 502 451 571
> comp10 30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575
> comp100 1060 1065 999 1332 918 1620 1697 658 1117 861 1336
> comp1000 130 157 229 266 141 247 263 135 182 188 168
> comp10000 35 14 15 37 10 47 28 17 22 21 12
> X3P_R
> comp0 1732
> comp1 799
> comp10 51243
> comp100 1370
> comp1000 244
> comp10000 64
>> replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R")
>> groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2))
>> cD = new("countData", data = simData, replicates = replicates, groups=groups)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM")
> Error in match.arg(estimationType) :
> 'arg' should be one of "quantile", "total", "edgeR"
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75)
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75))
> Error in match.arg(estimationType) : 'arg' must be of length 1
>> calcNormFactors(cD)
> Error in calcNormFactors(cD) :
> calcNormFactors() only operates on 'matrix' and 'DGEList' objects
>> calcNormFactors(simData)
> X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R
> 1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707
> X2P_F X2P_R X3P_F X3P_R
> 1.0550414 0.8955669 1.0869486 1.1052472
>> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>> cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR")
> Calculating library sizes from column totals.
> Error in calcNormFactors(d, quantile = quantile, ...) :
> unused argument(s) (quantile = quantile)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list