[BioC] edgeR: GLM for multi-factor and mulit-level designs

Tue Nov 27 22:01:28 CET 2012

I think you have both tests correct. Each coefficient in the model 
matrix from conds is an "X vs A" coefficient. So the "condsB" 
coefficient is the coefficient for B vs A. That means that you can test 
B vs A by simply testing that coefficient equal to zero, as you have 
done. Note that you could also make a contrast for the same test like this:

B_A<-makeContrasts(condsB,levels=design)

Essentially, you are testing the null hypothesis that the specified 
coefficient is zero for each gene.

which would make a vector with just one nonzero element. When you test 
the contrast of condsC-condsD, you are subtracting "D vs A" from "C vs 
A", which is the same as adding "C vs A" and "A vs D". Thus this 
difference gives "C vs D", which is what you want. Condition A is not 
being ignored; it simply cancels out of the contrast in question. 
Instead of testing whether a single coefficient is equal to zero, you 
are testing whether the difference of two coefficients is zero.

By the way, note the order: "condsC - condsD" will give the same 
statistics as "condsD - condsC", but will give opposite fold changes, so 
choose the one that makes the most sense to you (typically "treatment - 
control" for simple experiments).

On Tue 27 Nov 2012 07:18:20 AM PST, Dorota Herman wrote:
>
> Hello everyone,
>
> I have been playing with the GLM approach for RNA-seq data in DESeq
> and edgeR but I am fairly new in DE analyses. I am interested in
> pairwise comparisons in multi-factor multi-level designs. My question
> concerns my understanding of an application of the glmLRT function
>
> #My code is
>>
>> countsTable <- read.delim(file)
>> header <-
>
> c('A_1','A_2','A_3','B_1','B_2','B_3','C_1','C_2','C_3','D_1','D_2','D_3') 
>
>
>>
>> names(countsTable) <- header
>> conds <- factor(c('A','A','A’,'B','B','B','C','C','C','D','D','D'))
>> Ex<-factor(c('exper1', 'exper2', 'exper3', 'exper2', 'exper3', 'exper4',
>
> 'exper1', 'exper3', 'exper4', 'exper2', 'exper3', 'exper4'))
>>
>> group <- conds
>> dge <- DGEList(counts=countsTable,group=group)
>> dge <- calcNormFactors(dge, method='TMM')
>> design <- model.matrix(~Ex+conds)
>> rownames(design)<-colnames(dge)
>> dge <- estimateGLMCommonDisp(dge,design)
>> dge <- estimateGLMTrendedDisp(dge, design)
>> dge <- estimateGLMTagwiseDisp(dge, design)
>> fit <- glmFit(dge, design)
>
>
> #my design looks like:
>>
>> design
>
> (Intercept) Eexper2 Eexper3 Eexper4 condsB condsC condsD
> A_1 1 0 0 0 0 0 0
> A_2 1 1 0 0 0 0 0
> A_3 1 0 1 0 0 0 0
> B_1 1 1 0 0 1 0 0
> B_2 1 0 1 0 1 0 0
> B_3 1 0 0 1 1 0 0
> C_1 1 0 0 0 0 1 0
> C_2 1 0 1 0 0 1 0
> C_3 1 0 0 1 0 1 0
> D_1 1 1 0 0 0 0 1
> D_2 1 0 1 0 0 0 1
> D_3 1 0 0 1 0 0 1
> attr(,"assign")
> [1] 0 1 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$E
> [1] "contr.treatment"
> attr(,"contrasts")$conds
> [1] "contr.treatment"
>
> I understand that R function rewrote the model matrix because of the
> identifiability problem for parameter estimations. However it causes
> my confusion in further usage of that design for the pairwise
> comparisons.
>
> In a case when I want to obtain differentially expressed genes between
> A and B, I understand I should use the function:
>>
>> lrt <- glmLRT(fit,coef="condsB")
>
> Is it correct?
>
> In a case when I want to obtain differentially expressed genes between
> C and D (*without taking into account A*), are these calling functions
> correct?
>>
>> C_D<-makeContrasts(condsC-condsD,levels=design)
>> lrt <- glmLRT(fit,contrast=C_D)
>
>
> Does it mean that glmLRT function takes into account first conds (A)
> when we use ‘coef’ parameter and discard it when we use ‘contrast’
> parameter? Or it means that the second analysis, between C and D takes
> into account differential expression with A too?
>
> I hope my explanation of the question is not too confusing.
> Best wishes
> Dorota