[BioC] Question about median of replicates

James W. MacDonald jmacdon at uw.edu
Thu Jul 31 17:34:12 CEST 2014


Hi Sandra,

The logFC is the coefficient from your model estimating the difference 
in logCPM between groups. It will not be the sum of the counts from each 
group. It will be rather close to the difference between the log mean 
counts per million (logCPM) that you can compute from your data, but not 
exactly.

This is because the coefficients are estimated internally by edgeR, and 
cannot be computed directly. As an example, I ran the example for glmFit 
(just to get some data), and then modified slightly to conform to your 
experiment:

example(glmFit)
nlibs <- 6
x <- factor(rep(1:2, each=3), labels = c("Trt","Cont"))
design <- model.matrix(~x)
d <- DGEList(y)
d <- calcNormFactors(d)
fit <- glmFit(d, design, dispersion=dispersion.true)
results <- glmLRT(fit, coef=2)
topTags(results)
Coefficient:  xCont
            logFC   logCPM        LR       PValue        FDR
Gene60 -2.510450 13.90319 11.493249 0.0006984944 0.06984944
Gene95 -2.006865 13.82370  7.636606 0.0057195447 0.27359986
Gene18  2.191870 13.56029  6.987521 0.0082079958 0.27359986
Gene23 -1.873228 13.74293  6.450792 0.0110902864 0.27725716

Then we can compute the mean difference between the logCPM for the first 
gene (Gene60):

z <- rowMeans(cpm(d, log=TRUE)[,4:6]) - rowMeans(cpm(d, log=TRUE)[,1:3])
z[60]
   Gene60
-2.797037

So you can see that the value I get when I compute by hand is close to 
the value reported by edgeR, but not the same. This is because there is 
no closed form solution for the model we are fitting (e.g., you can't 
just calculate the answer by hand), so the coefficients have to be 
estimated iteratively by R.

Best,

Jim

On 7/30/2014 6:44 PM, Sandra Fernandez Moya wrote:
> Hello, I have a very important question, cause we are going to submit a
paper in a few hours and now we realize that maybe we have an error, so
I want to check with experts beforehand. The thing is I used EdgeR for
comparison of 2 groups, Control and Group1, and 3 samples, Control:1 and
Group1:2; I followed the basic protocol, because I dont know so much
about this analyses and I get a final logFC, that it makes sense and
also we checked in the lab. But, now, the referees asked if the logFC
was from the data Group1 normalized and with the mean of both and we
realize that it was not the mean but seems to be the summ of the counts
from each sample of Group1 the ones that the software take for the
analysis. Maybe I did something wrong, but can you confirm me this? It
shouldnt be, but does EdgeR summ the counts of each replicate and uses
it for the analysis?Thanks a lot, and I wait for the answer!Sandra
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list