[BioC] edgeR: GLM for multi-factor and mulit-level designs
Ryan C. Thompson
rct at thompsonclan.org
Tue Nov 27 22:01:28 CET 2012
I think you have both tests correct. Each coefficient in the model
matrix from conds is an "X vs A" coefficient. So the "condsB"
coefficient is the coefficient for B vs A. That means that you can test
B vs A by simply testing that coefficient equal to zero, as you have
done. Note that you could also make a contrast for the same test like this:
B_A<-makeContrasts(condsB,levels=design)
Essentially, you are testing the null hypothesis that the specified
coefficient is zero for each gene.
which would make a vector with just one nonzero element. When you test
the contrast of condsC-condsD, you are subtracting "D vs A" from "C vs
A", which is the same as adding "C vs A" and "A vs D". Thus this
difference gives "C vs D", which is what you want. Condition A is not
being ignored; it simply cancels out of the contrast in question.
Instead of testing whether a single coefficient is equal to zero, you
are testing whether the difference of two coefficients is zero.
By the way, note the order: "condsC - condsD" will give the same
statistics as "condsD - condsC", but will give opposite fold changes, so
choose the one that makes the most sense to you (typically "treatment -
control" for simple experiments).
On Tue 27 Nov 2012 07:18:20 AM PST, Dorota Herman wrote:
>
> Hello everyone,
>
> I have been playing with the GLM approach for RNA-seq data in DESeq
> and edgeR but I am fairly new in DE analyses. I am interested in
> pairwise comparisons in multi-factor multi-level designs. My question
> concerns my understanding of an application of the glmLRT function
>
> #My code is
>>
>> countsTable <- read.delim(file)
>> header <-
>
> c('A_1','A_2','A_3','B_1','B_2','B_3','C_1','C_2','C_3','D_1','D_2','D_3')
>
>
>>
>> names(countsTable) <- header
>> conds <- factor(c('A','A','A’,'B','B','B','C','C','C','D','D','D'))
>> Ex<-factor(c('exper1', 'exper2', 'exper3', 'exper2', 'exper3', 'exper4',
>
> 'exper1', 'exper3', 'exper4', 'exper2', 'exper3', 'exper4'))
>>
>> group <- conds
>> dge <- DGEList(counts=countsTable,group=group)
>> dge <- calcNormFactors(dge, method='TMM')
>> design <- model.matrix(~Ex+conds)
>> rownames(design)<-colnames(dge)
>> dge <- estimateGLMCommonDisp(dge,design)
>> dge <- estimateGLMTrendedDisp(dge, design)
>> dge <- estimateGLMTagwiseDisp(dge, design)
>> fit <- glmFit(dge, design)
>
>
> #my design looks like:
>>
>> design
>
> (Intercept) Eexper2 Eexper3 Eexper4 condsB condsC condsD
> A_1 1 0 0 0 0 0 0
> A_2 1 1 0 0 0 0 0
> A_3 1 0 1 0 0 0 0
> B_1 1 1 0 0 1 0 0
> B_2 1 0 1 0 1 0 0
> B_3 1 0 0 1 1 0 0
> C_1 1 0 0 0 0 1 0
> C_2 1 0 1 0 0 1 0
> C_3 1 0 0 1 0 1 0
> D_1 1 1 0 0 0 0 1
> D_2 1 0 1 0 0 0 1
> D_3 1 0 0 1 0 0 1
> attr(,"assign")
> [1] 0 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$E
> [1] "contr.treatment"
> attr(,"contrasts")$conds
> [1] "contr.treatment"
>
> I understand that R function rewrote the model matrix because of the
> identifiability problem for parameter estimations. However it causes
> my confusion in further usage of that design for the pairwise
> comparisons.
>
> In a case when I want to obtain differentially expressed genes between
> A and B, I understand I should use the function:
>>
>> lrt <- glmLRT(fit,coef="condsB")
>
> Is it correct?
>
> In a case when I want to obtain differentially expressed genes between
> C and D (*without taking into account A*), are these calling functions
> correct?
>>
>> C_D<-makeContrasts(condsC-condsD,levels=design)
>> lrt <- glmLRT(fit,contrast=C_D)
>
>
> Does it mean that glmLRT function takes into account first conds (A)
> when we use ‘coef’ parameter and discard it when we use ‘contrast’
> parameter? Or it means that the second analysis, between C and D takes
> into account differential expression with A too?
>
> I hope my explanation of the question is not too confusing.
> Best wishes
> Dorota
More information about the Bioconductor
mailing list