[BioC] Contrasts for 3x2 factorial experiment in R/edgeR

Tue Dec 20 22:44:42 CET 2011

Hello,

I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts.  

The design of my experiment has two factors, strain and treatment.  There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated).  I have four biological replicates (except for one sample group where there are only three).

# Group   Strain  Treatment
# A.Un     A           Un
# B.Un     B           Un
# C.Un    C           Un
# A.Stim  A           Stim
# B.Stim  B           Stim
# C.Stim  C          Stim

Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows.

Intercept:  (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6
Strain1:  (-A.Un + B.Un - A.Stim + B.Stim)/4
Strain2:  (-A.Un + C.Un - A.Stim + C.Stim)/4
Treatment1:  (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6
Strain1:Treatment1:  (A.Un - B.Un - A.Stim + B.Stim)/4
Strain2:Treatment2:  (A.Un - C.Un - A.Stim + C.Stim)/4

I'm interested in these questions:

1. Which genes are differentially expressed in Strain B compared to Strain A?
2. Which genes are differentially expressed in Strain C compared to Strain A?
3. Which genes respond to Stimulated treatment overall?
4. Which genes respond to Stimulated treatment in Strain B?
5. Which genes respond to Stimulated treatment in Strain C?
6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?

I believe questions 1, 2, 3, 6 and 7 are model coefficients.  If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function?

For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function?

Below is my current script and session information.

Thank you in advance for your help,

- Ben

library(edgeR)

# 3x2 Factorial Design
# Strain  Treatment
# A       Un      
# B       Un
# C       Un
# A       Stim
# B       Stim
# C       Stim

## Specify factors
Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","A","A","A","B","B","B","B","C","C","C","C"))
Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim"))

## Read in count data
raw.data <- read.table("group_counts.txt",sep="\t",header=T)
d <- raw.data[, 2:24]
rownames(d) <- raw.data[, 1]

# make DGE object
dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Stim","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.Stim","C.Stim"))
dge <- calcNormFactors(dge)

# filter uninformative genes
m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size)
ridx <- rowSums(m > 1) >= 2
dge <- dge[ridx,]

## Design matrix
# treatment-contrast parameterization
contrasts(Strain) <- contr.sum(3)
contrasts(Treatment) <- contr.sum(2)
design <- model.matrix(~Strain*Treatment)

## Estimate Dispersion
dge <- estimateGLMCommonDisp(dge, design)

## Fit model with Common Dispersion
fit <- glmFit(dge, design, dispersion=dge$common.dispersion)

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_2.2.5

loaded via a namespace (and not attached):
[1] limma_3.8.3  tools_2.13.1