[BioC] Possible Bugs in Contrasts Design of edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Mon Apr 29 10:17:12 CEST 2013
Dear Yang Liu,
I believe that the documentation is correct as it stands.
The coefficient "DiseaseHealthy:TreatmentHormone" represents the average
log-fold-change after hormone treatment for healthy patients. The
coefficient "DiseaseDisease1:TreatmentHormone" represents the average
log-fold-change after hormone treatment for patients in disease group 1.
So taking the difference of these two coefficients represents the
contrast as described in the documentation.
The linear model here is equivalent to three paired t-tests, one for each
disease group. In effect we are fitting an additive model
(patient+treatment) within each disease group.
You are correct in your interpretation of the intercept term, but this is
not relevant because it cancels out of any contrast.
Best wishes
Gordon
---------------- original message ----------------
[BioC] Possible Bugs in Contrasts Design of edgeR
Yang Liu [guest] guest at bioconductor.org
Thu Apr 25 17:59:21 CEST 2013
Hi,
I am a PhD student from Dept. Statistics of Penn State University. I have
been using edgeR for my RNA-seq data recently. As I moved forward with
edgeR, I found there are probably some bugs in the latest documentation.
My experimental design is exactly the same as the design (Comparisons Both
Between and Within Subject) on page 32 in the latest documentation (last
revised on 31 March 2013). From page 32 to 34, everything looks fine to me
except for the contrasts defined at the end.
If I understand it properly, with the procedure showed on those pages, I
can estimate each effect in the design, which can be showed with
colnames(design). Then, all contrasts should be based on those estimated
effects. For the first contrast on page 34, it is trying to find genes
that respond differently to the hormone in disease1 vs healthy patients.
The contrast looks as follows:
lrt <- glmLRT(fit, contrast = c(0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, 0))
However, it looks not right to me, and it only put weights on the 10th and
11th effects ("-1" and "1"), which are corresponding to
"DiseaseHealthy:TreatmentHormone" and "DiseaseDisease1:TreatmentHormone".
As I see, the intercept (baseline) represents the combined effect from
Healthy, Patient 1, and None Treatment. In order to find genes that
respond differently to the hormone in disease1 vs healthy patients
(regardless which patient it is), you could not leave the weights for all
4th to 9th effects as zeros. Also, as the contrast compares disease1 to
healthy, the weight in the contrast for the 2nd effect should not be zero
as well. I think the right contrast should be as follows:
lrt <- glmLRT(fit, contrast = c(0, 1, 0, -1/3, 1/3, 0, -1/3, 1/3, 0, -1,
1, 0))
Please feel free to correct me if I am wrong.
Thanks,
Yang Liu
-- output of sessionInfo():
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.0.8 limma_3.14.4
--
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list