[BioC] edgeR: pseudocounts, logConc and logFC
hess at stat.colostate.edu
Fri Apr 9 17:38:39 CEST 2010
I am experimenting with edgeR for high throughput (next gen) sequence
data and proteomics spectral count data and have a few questions.
1. Is it correct to think of the pseudocounts (pseudo.alt produced by
estimateCommonDisp) as normalized counts? According to the edgeR
vignette “The pseudocounts are calculated using a quantile-to-quantile
method for the negative binomial so that the library sizes for the
pseudocounts are equal to the geometric mean of the original library
sizes.” For the data that I am working with, the column sums for
pseudo.alt are very close to the common.lib.size, but the boxplots do
not “line-up”. Is this because the pseudocounts are “generated under
the alternative hypothesis”?
2. I noticed that within the estimatePs function, the minimum value
is set to 8.783496e-16. I think the choice of this minimum will
affect the estimated logConc and logFC values, but will it affect the
test results (p-values)?
3. The ranges for logConc and logFC seems different when comparing
the graph produced by smearPlot and output produced by exactTest (for
a single comparison). Specifically, for each of the examples in the
edgeR vignette (and in my own data examples), the minimum logConc in
the smearPlot is ~ -16, while in the table from topTags the minimum is
~32. For logFC, the max shown in smearPlot is ~10, while the max in
topTags is ~40. After changing xlim and ylim in plotSmear, this
doesn’t seem to be an issue of setting the axes.
I am using edgeR_1.4.7 with R version 2.10.1.
More information about the Bioconductor