[BioC] edgeR - ANOVA-like test for differential gene expression
Benedikt Drosse
drosse at mpipz.mpg.de
Wed Jun 26 17:43:38 CEST 2013
Dear edgeR users,
I worked my way through the user's guide of edgeR. However, I still have
some questions and problems understanding the anova-like way of
analyzing differential gene expression. I would be very grateful for any
help or suggestion. I hope I did not overlook any earlier post
concerning the same topic.
My RNAseq data are derived from 4 different developmental stages with 3
biological replicates each (see overview). I am interested in genes
differentially expressed between any of the developmental stages.
I am not sure, which way of analysis is best to answer this question.
Making the "Anova-like" model asking for all contrasts at once or doing
all possible contrasts one by one.
In principle the "anova-like" way of analysis should fit best to my
question, if I understood the edgeR user's guide correctly. So I make a
model matrix as given in "design" and fit the glm model for the effect
of Development including the intercept. To ask for expression
differences between any of the developmental stages I use "coef=2:4" in
the call of glmLRT.
So my questions are:
Will coef=2:4 yield only the contrasts of Development2-Development1,
Development3-Development1, Development4-Development1 (as it is described
in the user's guide),
or will it also test the contrasts of Development3-Development2,
Development4-Development2 and Development4-Development3, which would
also be important contrasts for me to analyze?
For further filtering and candidate gene identification I would like to
extract the logFoldChanges and significance level between any of the
developmental stages.
If the "anova-like" analysis provides all of the six possible contrasts,
is there an easy way to extract logFC and significance level for any of
these contrasts from the resulting glmLRT-output?
If the anova-like analysis does NOT yield ALL of the possible contrasts,
I would have to make the contrasts one by one.
In this case would it be more useful to build the glm model without an
intercept (see design_2), and then ask for the six contrasts separately
as indicated in the comment of glm_data_2?
Doing so it would be easy for me to get the logFC and the corresponding
signficance level.
Is it conceptionally wrong to do the single contrasts instead of an
anova-like analysis concerning the multiple testing problem (6 tests
instead of 1)?
I will be very grateful for any comment on my questions,
Best regards,
Benedikt Digel
#R code:
overview = matrix(nrow=12, ncol=1)
overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4))
colnames(overview) = "Development"
rownames(overview) = paste("library_", 1:12, sep="")
design = model.matrix(object = ~Development)
fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not
provide the corresponding data for the analysis, since it should just
indicate the way I call glmFit
glm_data = glmLRT(glmfit=fit, coef=2:4)
design_2 = model.matrix(object=~0+Development)
fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2) # I
did not provide the corresponding data for the analysis, since it should
just indicate the way I call glmFit
glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) #
c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all
possible contrasts to extract effects of all comparisons
--
Benedikt Digel geb. Drosse
PhD student
AG von Korff
Max Planck Institute for Plant Breeding Research
Dept. Plant Developmental Biology
Carl-von-Linné-Weg 10
50829 Cologne
Germany
Tel. (Office): +49-(0)221-5062-280
Email: drosse at mpipz.mpg.de
More information about the Bioconductor
mailing list