[BioC] Question about median of replicates
James W. MacDonald
jmacdon at uw.edu
Thu Jul 31 17:34:12 CEST 2014
Hi Sandra,
The logFC is the coefficient from your model estimating the difference
in logCPM between groups. It will not be the sum of the counts from each
group. It will be rather close to the difference between the log mean
counts per million (logCPM) that you can compute from your data, but not
exactly.
This is because the coefficients are estimated internally by edgeR, and
cannot be computed directly. As an example, I ran the example for glmFit
(just to get some data), and then modified slightly to conform to your
experiment:
example(glmFit)
nlibs <- 6
x <- factor(rep(1:2, each=3), labels = c("Trt","Cont"))
design <- model.matrix(~x)
d <- DGEList(y)
d <- calcNormFactors(d)
fit <- glmFit(d, design, dispersion=dispersion.true)
results <- glmLRT(fit, coef=2)
topTags(results)
Coefficient: xCont
logFC logCPM LR PValue FDR
Gene60 -2.510450 13.90319 11.493249 0.0006984944 0.06984944
Gene95 -2.006865 13.82370 7.636606 0.0057195447 0.27359986
Gene18 2.191870 13.56029 6.987521 0.0082079958 0.27359986
Gene23 -1.873228 13.74293 6.450792 0.0110902864 0.27725716
Then we can compute the mean difference between the logCPM for the first
gene (Gene60):
z <- rowMeans(cpm(d, log=TRUE)[,4:6]) - rowMeans(cpm(d, log=TRUE)[,1:3])
z[60]
Gene60
-2.797037
So you can see that the value I get when I compute by hand is close to
the value reported by edgeR, but not the same. This is because there is
no closed form solution for the model we are fitting (e.g., you can't
just calculate the answer by hand), so the coefficients have to be
estimated iteratively by R.
Best,
Jim
On 7/30/2014 6:44 PM, Sandra Fernandez Moya wrote:
> Hello, I have a very important question, cause we are going to submit a
paper in a few hours and now we realize that maybe we have an error, so
I want to check with experts beforehand. The thing is I used EdgeR for
comparison of 2 groups, Control and Group1, and 3 samples, Control:1 and
Group1:2; I followed the basic protocol, because I dont know so much
about this analyses and I get a final logFC, that it makes sense and
also we checked in the lab. But, now, the referees asked if the logFC
was from the data Group1 normalized and with the mean of both and we
realize that it was not the mean but seems to be the summ of the counts
from each sample of Group1 the ones that the software take for the
analysis. Maybe I did something wrong, but can you confirm me this? It
shouldnt be, but does EdgeR summ the counts of each replicate and uses
it for the analysis?Thanks a lot, and I wait for the answer!Sandra
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list