[BioC] Differences between limma voom E values and edgeR cpm values?

Wed Jun 4 01:51:27 CEST 2014

Dear John,

Yes, it is true that you can't reproduce exactly the voom log-cpm values 
in edgeR.  The reasons for this are somewhat subtle.

First, why does edgeR allow choice of prior.count while voom presets it at 
0.5?  The edgeR logCPM values are only for descriptive purposes, so it is 
easy to compute it is different ways.  Allowing a choice of prior.count 
values allows users to choose where they want to be in the noise-bias 
trade-off.  Choosing a large prior.count may be valuable to damp down the 
variability of small count cpm values.  In voom, changes to prior.count 
cannot easily be made because it would affect the whole downstream 
analysis process.  Other prior.count values may not give the nice 
predictable mean-variance trend that we see with 0.05.  Nor does voom need 
the different choices, because it is able to deal with decreased precision 
at low count values by assigning lower precision weights.

Why is the prior.count scaled to library size in edgeR?  Because this 
ensures that any fold change that was equal to 1 before the prior.count 
was added stays equal to 1 after adding.  In voom, however, the 
prior.counts are not scaled because the mean-variance modelling in voom 
requires the size of the counts to have an absolute meaning, not relative 
to library size.  Scaling the prior.count interferes with the variance 
modelling.  Empirical testing shows the voom performs very well for very 
unequal library sizes, so the cost of not scaling doesn't seem to be 
great.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Thu, 29 May 2014, John Brothers II wrote:

> Hello,
>
> I have a quick question about E-values in voom versus cpm from edgeR
>
> E-values from voom are calculated in the following way ->
>
> t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06))
>
> If I understand this correctly, this is log2 counts per million of counts
> with a pseudo-count of 0.5, normalized on the library size + 2 *
> pseudocount (which was manually set to 0.5)
>
> However, the cpm function in edgeR is slightly different when you want use
> cpm(x, log=T, prior.count=0.5).
> It calculates the following:
> # First scales the prior.count/pseudo-count and adds 2x the scaled prior
> count to the libsize
> prior.count.scaled <- lib.size/mean(lib.size)*prior.count
> lib.size <- lib.size+2*prior.count.scaled
> lib.size <- 1e-6*lib.size
> # Calculates log2
> log2(t( (t(x)+prior.count.scaled) / lib.size ))
>
> Is there a reason the pseudocount/prior-count is able to be set by the user
> and then scaled to library size in the edgeR cpm function, but is manually
> set as 0.5 regardless of library size in voom?
>
> That's the only difference I see between the E-value calculation and the
> cpm function (and when I choose a value for the prior.count that returns a
> prior.count.scaled value equal to 0.5, I then get the same values for cpm
> in edgeR as I would when using voom E values).
>
> Thanks,
>
> John
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}