[BioC] Contrasts for 3x2 factorial experiment in R/edgeR
Benjamin King
bking at mdibl.org
Tue Dec 20 22:44:42 CET 2011
Hello,
I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts.
The design of my experiment has two factors, strain and treatment. There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated). I have four biological replicates (except for one sample group where there are only three).
# Group Strain Treatment
# A.Un A Un
# B.Un B Un
# C.Un C Un
# A.Stim A Stim
# B.Stim B Stim
# C.Stim C Stim
Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows.
Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6
Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4
Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4
Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6
Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4
Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4
I'm interested in these questions:
1. Which genes are differentially expressed in Strain B compared to Strain A?
2. Which genes are differentially expressed in Strain C compared to Strain A?
3. Which genes respond to Stimulated treatment overall?
4. Which genes respond to Stimulated treatment in Strain B?
5. Which genes respond to Stimulated treatment in Strain C?
6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function?
For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function?
Below is my current script and session information.
Thank you in advance for your help,
- Ben
library(edgeR)
# 3x2 Factorial Design
# Strain Treatment
# A Un
# B Un
# C Un
# A Stim
# B Stim
# C Stim
## Specify factors
Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","A","A","A","B","B","B","B","C","C","C","C"))
Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim"))
## Read in count data
raw.data <- read.table("group_counts.txt",sep="\t",header=T)
d <- raw.data[, 2:24]
rownames(d) <- raw.data[, 1]
# make DGE object
dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Stim","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.Stim","C.Stim"))
dge <- calcNormFactors(dge)
# filter uninformative genes
m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size)
ridx <- rowSums(m > 1) >= 2
dge <- dge[ridx,]
## Design matrix
# treatment-contrast parameterization
contrasts(Strain) <- contr.sum(3)
contrasts(Treatment) <- contr.sum(2)
design <- model.matrix(~Strain*Treatment)
## Estimate Dispersion
dge <- estimateGLMCommonDisp(dge, design)
## Fit model with Common Dispersion
fit <- glmFit(dge, design, dispersion=dge$common.dispersion)
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_2.2.5
loaded via a namespace (and not attached):
[1] limma_3.8.3 tools_2.13.1
More information about the Bioconductor
mailing list