[BioC] Contrasts for 3x2 factorial experiment in R/edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Jan 4 00:23:34 CET 2012
Dear Ben,
Your design matrix is still using the sum-to-zero parametrization, which I
find a bit unhelpful in the genomic context. As I said in my previous
reply, I will leave interpretations of that parametrization to you.
Best wishes
Gordon
On Tue, 3 Jan 2012, Benjamin King wrote:
> Dear Gordon,
>
> I very much appreciate your reply and I will follow your suggestion of
> not using the sum-to-zero parameterization.
>
> I do have one additional question about interpreting the model
> coefficients. If my design matrix is as shown below and the model is
> "~Strain+Strain:Treatment" is the correct interpretation of the
> coefficients?
>
> Coefficient Interpretation
> Intercept
> Strain1 "Strain B compared to Strain A (i.e., Which genes are differentially expressed in Strain B compared to Strain A?)"
> Strain2 "Strain C compared to Strain A (i.e., Which genes are differentially expressed in Strain C compared to Strain A?)"
> Treatment1 "Overall treatment effect (i.e., Which genes respond to Stimulated treatment overall?)"
> Strain1:Treatment1 "Interaction between StrainB and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?)"
> Strain2:Treatment2 "Interaction between StrainC and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain C compared to Strain A?)"
>
> (Intercept) Strain1 Strain2 Treatment1 Strain1:Treatment1 Strain2:Treatment1
> A.Un 1 -1 -1 -1 1 1
> A.Un 1 -1 -1 -1 1 1
> A.Un 1 -1 -1 -1 1 1
> B.Un 1 1 0 -1 -1 0
> B.Un 1 1 0 -1 -1 0
> B.Un 1 1 0 -1 -1 0
> B.Un 1 1 0 -1 -1 0
> C.Un 1 0 0 -1 0 -1
> C.Un 1 0 0 -1 0 -1
> C.Un 1 0 0 -1 0 -1
> C.Un 1 0 0 -1 0 -1
> A.Stim 1 -1 -1 1 -1 -1
> A.Stim 1 -1 -1 1 -1 -1
> A.Stim 1 -1 -1 1 -1 -1
> A.Stim 1 -1 -1 1 -1 -1
> B.Stim 1 1 0 1 1 0
> B.Stim 1 1 0 1 1 0
> B.Stim 1 1 0 1 1 0
> B.Stim 1 1 0 1 1 0
> C.Stim 1 0 1 1 0 1
> C.Stim 1 0 1 1 0 1
> C.Stim 1 0 1 1 0 1
> C.Stim 1 0 1 1 0 1
>
> If this is correct, then is the interaction between strain B and treatment the fifth coefficient or c(0,0,0,0,1,0)?
>
> Likewise, is the interaction between strain C and treatment the sixth coefficient or c(0,0,0,0,0,1)?
>
> Thank you,
>
> - Ben
>
>
> On Jan 2, 2012, at 6:16 PM, Gordon K Smyth wrote:
>
>> Dear Ben,
>>
>> glmLRT() allows you to specify the contrast to be tested in two ways. If the contrast happens to correspond to a column of your design matrix, then it is one the original coefficients and you can just specify which one using the coef= argument. Otherwise you can use the contrast= argument to specify any contrast, using a numerical vector of the same length as the number of coefficients.
>>
>> Suppose you used the default treatment parametrization:
>>
>> contrasts(Strain) <- contr.treat(3)
>> contrasts(Treatment) <- contr.treat(2)
>>
>> and fitted a model with Treatment nested within Strain:
>>
>> design <- model.matrix(~Strain+Strain:Treatment)
>>
>> Then the coefficients 4:6 would be the logFC after stimulation in strains A to C respectively. Hence you could test for differential expression after stimulation in strain B by:
>>
>> glmLRT(dge,fit,coef=5)
>>
>> and so on. You could equally well specify the same test by
>>
>> glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0))
>>
>> To test for genes responding differently to simulation in strain B compared to strain A, you would use
>>
>> contrast=c(0,0,0,-1,1,0)
>>
>> And so on.
>>
>> Figuring out the appropriate contrast vectors using the sum-to-zero
>> parametrization specified by contr.sum() is more tedious, and I will
>> leave that to you.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Tue, 20 Dec 2011 16:44:42 -0500
>>> From: Benjamin King <bking at mdibl.org>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR
>>>
>>> Hello,
>>>
>>> I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts.
>>>
>>> The design of my experiment has two factors, strain and treatment. There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated). I have four biological replicates (except for one sample group where there are only three).
>>>
>>> # Group Strain Treatment
>>> # A.Un A Un
>>> # B.Un B Un
>>> # C.Un C Un
>>> # A.Stim A Stim
>>> # B.Stim B Stim
>>> # C.Stim C Stim
>>>
>>> Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows.
>>>
>>> Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6
>>> Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4
>>> Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4
>>> Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6
>>> Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4
>>> Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4
>>>
>>> I'm interested in these questions:
>>>
>>> 1. Which genes are differentially expressed in Strain B compared to Strain A?
>>> 2. Which genes are differentially expressed in Strain C compared to Strain A?
>>> 3. Which genes respond to Stimulated treatment overall?
>>> 4. Which genes respond to Stimulated treatment in Strain B?
>>> 5. Which genes respond to Stimulated treatment in Strain C?
>>> 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
>>> 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?
>>>
>>> I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function?
>>>
>>> For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function?
>>>
>>> Below is my current script and session information.
>>>
>>> Thank you in advance for your help,
>>>
>>> - Ben
>>>
>>>
>>>
>>> library(edgeR)
>>>
>>> # 3x2 Factorial Design
>>> # Strain Treatment
>>> # A Un
>>> # B Un
>>> # C Un
>>> # A Stim
>>> # B Stim
>>> # C Stim
>>>
>>> ## Specify factors
>>> Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","A","A","A","B","B","B","B","C","C","C","C"))
>>> Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim"))
>>>
>>> ## Read in count data
>>> raw.data <- read.table("group_counts.txt",sep="\t",header=T)
>>> d <- raw.data[, 2:24]
>>> rownames(d) <- raw.data[, 1]
>>>
>>> # make DGE object
>>> dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Stim","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.Stim","C.Stim"))
>>> dge <- calcNormFactors(dge)
>>>
>>> # filter uninformative genes
>>> m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size)
>>> ridx <- rowSums(m > 1) >= 2
>>> dge <- dge[ridx,]
>>>
>>> ## Design matrix
>>> # treatment-contrast parameterization
>>> contrasts(Strain) <- contr.sum(3)
>>> contrasts(Treatment) <- contr.sum(2)
>>> design <- model.matrix(~Strain*Treatment)
>>>
>>> ## Estimate Dispersion
>>> dge <- estimateGLMCommonDisp(dge, design)
>>>
>>> ## Fit model with Common Dispersion
>>> fit <- glmFit(dge, design, dispersion=dge$common.dispersion)
>>>
>>>
>>>
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] splines stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] edgeR_2.2.5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] limma_3.8.3 tools_2.13.1
>>
>
>
>
>
>
>
>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list