[BioC] EdgeR LogCpm and LogFC values calculation

Fri Jan 31 14:13:39 CET 2014

Dear Koen,

The source code of the function aveLogCPM (included in the package source of the current release version 3.4.2) shows that the function mglmOneGroup needs some additional parameters and other default values than the ones you used:

prior.count <-  2
dispersion <-  0.05
# y is the  matrix of the raw counts
y <- as.matrix(y)
if(is.null(lib.size)) lib.size <- colSums(y)
prior.count.scaled <- lib.size/mean(lib.size) * prior.count
offset <- log(lib.size+2*prior.count.scaled)
abundance <- mglmOneGroup(t(t(y)+prior.count.scaled),dispersion=dispersion,offset=offset)
(abundance+log(1e6)) / log(2)

On a small test dataset this gives me the same values as the AveLogCPM component of a DGELRT object. Could you give this a try?

best,
Perry

---
Perry Moerland, PhD
Room J1B-215
Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics
Academic Medical Center, University of Amsterdam
Postbus 22660, 1100 DD Amsterdam, The Netherlands
tel: +31 20 5666945
p.d.moerland at amc.uva.nl, http://www.bioinformaticslaboratory.nl/

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Koen Van den Berge
Sent: Friday, January 31, 2014 1:08 PM
To: bioconductor at r-project.org
Subject: [BioC] EdgeR LogCpm and LogFC values calculation

Dear Bioconductor mailing list,

I am currently researching the differences in RNA-Seq data analysis, comparing the two well known EdgeR and Voom methods. However, there is one thing I can not manage to reproduce, namely the logCPM value in the output of the LRT table of EdgeR, after analyzing a certain contrast or coefficient. I understand from the manual and helpfile, that this logCPM value is a log2 counts per million, normalized for library sizes. I also understand that it is not a simple mean that is being used, but rather the
mglmOneGroup() function. However, when I try to recalculate this myself, I can not obtain the same value. My workflow in doing so looks like this:

#Calculate through the table
counts <- read.delim("file.txt")
Treat <- factor(rep(c(rep(c("Cont","DPN","OHT"),4)),each=2))[-19]
#Delete removed sample 19 by authors

y.s <- DGEList(counts=counts.filtered.smart,group=Treat)
y.s <- calcNormFactors(y.s)
y.common <- estimateGLMCommonDisp(y, design, verbose=TRUE) y.trended <- estimateGLMTrendedDisp(y.common, design) y.tagwise <- estimateGLMTagwiseDisp(y.trended, design,prior.df=22) lrt <- glmLRT(fit,coef= c(5,6,9,8)) lrt$table$logCPM

#Calculate manually
cpmstest <- cpm(y, normalized.lib.sizes = TRUE) LogCpmstest <- log(cpmstest + 0.5, base = 2)
mglmOneGroup(LogCpmstest) #not identical

mean(LogCpmstest) #not identical

mglmOneGroup(y.tagwise) #also non-identical

Have you got any recommendations in what should be changed in this manual calculation workflow? What am I doing wrong?

Any help would be greatly appreciated.

Sincerely, Koen.

        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
________________________________

AMC Disclaimer : http://www.amc.nl/disclaimer