[BioC] edgeR: pseudocounts, logConc and logFC

Fri Apr 9 17:38:39 CEST 2010

I am experimenting with edgeR for high throughput (next gen) sequence
data and proteomics spectral count data and have a few questions.

1.  Is it correct to think of the pseudocounts (pseudo.alt produced by
estimateCommonDisp) as normalized counts?  According to the edgeR
vignette “The pseudocounts are calculated using a quantile-to-quantile
method for the negative binomial so that the library sizes for the
pseudocounts are equal to the geometric mean of the original library
sizes.”  For the data that I am working with, the column sums for
pseudo.alt are very close to the common.lib.size, but the boxplots do
not “line-up”.  Is this because the pseudocounts are “generated under
the alternative hypothesis”?

2.  I noticed that within the estimatePs function, the minimum value
is set to 8.783496e-16.  I think the choice of this minimum will
affect the estimated logConc and logFC values, but will it affect the
test results (p-values)?

3.  The ranges for logConc and logFC seems different when comparing
the graph produced by smearPlot and output produced by exactTest (for
a single comparison).  Specifically, for each of the examples in the
edgeR vignette (and in my own data examples), the minimum logConc in
the smearPlot is ~ -16, while in the table from topTags the minimum is
~32.   For logFC, the max shown in smearPlot is ~10, while the max in
topTags is ~40.   After changing xlim and ylim in plotSmear, this
doesn’t seem to be an issue of setting the axes.

I am using edgeR_1.4.7 with R version 2.10.1.

Thanks!

Ann