[BioC] Possible Bugs in Contrasts Design of edgeR

Mon Apr 29 10:17:12 CEST 2013

Dear Yang Liu,

I believe that the documentation is correct as it stands.

The coefficient "DiseaseHealthy:TreatmentHormone" represents the average 
log-fold-change after hormone treatment for healthy patients.  The 
coefficient "DiseaseDisease1:TreatmentHormone" represents the average 
log-fold-change after hormone treatment for patients in disease group 1. 
So taking the difference of these two coefficients represents the 
contrast as described in the documentation.

The linear model here is equivalent to three paired t-tests, one for each 
disease group.  In effect we are fitting an additive model 
(patient+treatment) within each disease group.

You are correct in your interpretation of the intercept term, but this is 
not relevant because it cancels out of any contrast.

Best wishes
Gordon

---------------- original message ----------------

[BioC] Possible Bugs in Contrasts Design of edgeR
Yang Liu [guest] guest at bioconductor.org
Thu Apr 25 17:59:21 CEST 2013

Hi,

I am a PhD student from Dept. Statistics of Penn State University. I have 
been using edgeR for my RNA-seq data recently. As I moved forward with 
edgeR, I found there are probably some bugs in the latest documentation.

My experimental design is exactly the same as the design (Comparisons Both 
Between and Within Subject) on page 32 in the latest documentation (last 
revised on 31 March 2013). From page 32 to 34, everything looks fine to me 
except for the contrasts defined at the end.

If I understand it properly, with the procedure showed on those pages, I 
can estimate each effect in the design, which can be showed with 
colnames(design). Then, all contrasts should be based on those estimated 
effects. For the first contrast on page 34, it is trying to find genes 
that respond differently to the hormone in disease1 vs healthy patients. 
The contrast looks as follows:

lrt <- glmLRT(fit, contrast = c(0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, 0))

However, it looks not right to me, and it only put weights on the 10th and 
11th effects ("-1" and "1"), which are corresponding to 
"DiseaseHealthy:TreatmentHormone" and "DiseaseDisease1:TreatmentHormone". 
As I see, the intercept (baseline) represents the combined effect from 
Healthy, Patient 1, and None Treatment. In order to find genes that 
respond differently to the hormone in disease1 vs healthy patients 
(regardless which patient it is), you could not leave the weights for all 
4th to 9th effects as zeros. Also, as the contrast compares disease1 to 
healthy, the weight in the contrast for the 2nd effect should not be zero 
as well. I think the right contrast should be as follows:

lrt <- glmLRT(fit, contrast = c(0, 1, 0, -1/3, 1/3, 0, -1/3, 1/3, 0, -1, 
1, 0))

Please feel free to correct me if I am wrong.

Thanks,

Yang Liu

  -- output of sessionInfo():

R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.0.8  limma_3.14.4

--

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}