[BioC] edgeR - ANOVA-like test for differential gene expression

Wed Jun 26 17:43:38 CEST 2013

Dear edgeR users,

I worked my way through the user's guide of edgeR. However, I still have 
some questions and problems understanding  the anova-like way of 
analyzing differential gene expression. I would be very grateful for any 
help or suggestion. I hope I did not overlook any earlier post 
concerning the same topic.

My RNAseq data are derived from 4 different developmental stages with 3 
biological replicates each (see overview). I am interested in genes 
differentially expressed between any of the developmental stages.
I am not sure, which way of analysis is best to answer this question. 
Making the "Anova-like" model asking for all contrasts at once or doing 
all possible contrasts one by one.

In principle the "anova-like" way of analysis should fit best to my 
question, if I understood the edgeR user's guide correctly. So I make a 
model matrix as given in "design" and fit the glm model for the effect 
of Development including the intercept. To ask for expression 
differences between any of the developmental stages I use "coef=2:4" in 
the call of glmLRT.

So my questions are:
Will coef=2:4 yield only the contrasts of Development2-Development1, 
Development3-Development1, Development4-Development1 (as it is described 
in the user's guide),
or will it also test the contrasts of Development3-Development2, 
Development4-Development2 and Development4-Development3, which would 
also be important contrasts for me to analyze?
For further filtering and candidate gene identification I would like to 
extract the logFoldChanges and significance level between any of the 
developmental stages.
If the "anova-like" analysis provides all of the six possible contrasts, 
is there an easy way to extract logFC and significance level for any of 
these contrasts from the resulting glmLRT-output?

If the anova-like analysis does NOT yield ALL of the possible contrasts, 
I would have to make the contrasts one by one.
In this case would it be more useful to build the glm model without an 
intercept (see design_2), and then ask for the six contrasts separately 
as indicated in the comment of glm_data_2?
Doing so it would be easy for me to get the logFC and the corresponding 
signficance level.
Is it conceptionally wrong to do the single contrasts instead of an 
anova-like analysis concerning the multiple testing problem (6 tests 
instead of 1)?

I will be very grateful for any comment on my questions,

Best regards,
Benedikt Digel

#R code:
overview = matrix(nrow=12, ncol=1)
     overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4))
         colnames(overview) = "Development"
         rownames(overview) = paste("library_", 1:12, sep="")

design = model.matrix(object = ~Development)
     fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not 
provide the corresponding data for the analysis, since it should just 
indicate the way I call glmFit
         glm_data = glmLRT(glmfit=fit, coef=2:4)

design_2 = model.matrix(object=~0+Development)
     fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2)     # I 
did not provide the corresponding data for the analysis, since it should 
just indicate the way I call glmFit
         glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) # 
c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all 
possible contrasts to extract effects of all comparisons

-- 
Benedikt Digel geb. Drosse

PhD student
AG von Korff
Max Planck Institute for Plant Breeding Research
Dept. Plant Developmental Biology
Carl-von-Linné-Weg 10
50829 Cologne
Germany

Tel. (Office): +49-(0)221-5062-280
Email: drosse at mpipz.mpg.de