[BioC] limma design and contrast matrix for paired experiment

Mon Jan 14 00:14:18 CET 2013

Dear Gordon,

Thank you for your reply. Following the example from the edgeR users
guide, I revised the model:

CellLine <- factor(rep(c('C1','C1','C2','C2'),3),levels=c('C1','C2'))
Sample <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6),levels=c(1,2,3,4,5,6))
Treatment <- factor(rep(c('C','T'),6),levels=c('C','T'))

design <- model.matrix( ~ CellLine + CellLine:Sample + CellLine:Treatment   )

I am however, not, a bit unsure of which coefficients I am interested
in. It seems logical that I should be interested in
'CellLineC1:TreatmentT' and 'CellLineC2:TreatmentT' to answer the
question of 'Which genes are differentially expressed in CellLine 1
due to treatment X' and 'Which genes are differentially expressed in
CellLine 2 due to treatment X'. Is that correct?

Best,
David

2013/1/13 Gordon K Smyth <smyth at wehi.edu.au>:
> Dear David,
>
> No, your design matrix is incorrect because it ignores the pairing by sample
> in your experimental design.
>
> You can treat Sample as either a fixed or a random factor.  The fixed
> approach is closely analogous to a classical paired t-test.  For an
> experiment like yours, the fixed approach is explained in Section 3.5 of the
> edgeR User's Guide:
>
> http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>
> You can follow the same construction of the design matrix even though you
> are using limma.
>
> The random approach is a bit more aggressive.  For an experiment like yours,
> the random approach is explained in Section 8.7 of the limma User's Guide:
>
> http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf
>
> I would probably recommend the first approach for your data.  However the
> second approach is necessary if you want to test for differences between the
> two cell lines.
>
> Best wishes
> Gordon
>
>> Date: Fri, 11 Jan 2013 18:31:23 +0100
>> From: David Westergaard <david at harsk.dk>
>> To: bioconductor at r-project.org
>> Subject: [BioC]  limma design and contrast matrix for paired
>>         experiment
>>
>> Hello,
>>
>> I am analysing microarray data performed on two cell cultures, in
>> which the gene expression were measured before (C) and after treatment
>> (T), so that the targets look like this:
>>
>> Cell-line       Treatment       Sample
>> Cell 1  C       1
>> Cell 1  T       1
>> Cell 2  C       2
>> Cell 2  T       2
>> Cell 1  C       3
>> Cell 1  T       3
>> Cell 2  C       4
>> Cell 2  T       4
>> Cell 1  C       5
>> Cell 1  T       5
>> Cell 2  C       6
>> Cell 2  T       6
>>
>> All experiments were performed on single-channel Agilent arrays, with
>> 4 samples pr. slide. I am interested in determining the differentially
>> expressed genes between Cell1 before and after treatment, as well as
>> Cell2 before and after treatment. This is the preliminary code:
>>
>> # Load and normalize data
>> RG <-
>> read.maimages(targets$FileName,source="agilent.median",green.only=TRUE)
>> # Assume there is a col called FileName in the targets section
>> RG <- backgroundCorrect(RG, method="normexp", offset=16)
>> RGNorm <- normalizeBetweenArrays(RG, method="quantile")
>> RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName)
>>
>> # Create design
>> Pairing <-
>> paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2,1,1,2,2),rep(c('C','T'),6),sep='')
>> pair <- factor(Pairing,levels=unique(Pairing))
>> design <- model.matrix( ~ 0 + pair )
>> colnames(design) <- c('C1.C','C1.T','C2.C','C2.T')
>>
>> # Fit data
>> fit <- lmFit(RGNorm.ave, design=design)
>>
>> cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design)
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2)
>>
>>
>> For the experiment, is the design/contrast matrix a proper choice to
>> answer the questions of 'Which genes are differentially expressed in
>> Cell 1' and  'Which genes are differentially expressed in Cell 2'?
>> Further, should I do any technical correction, such as
>> duplicateCorrelation or similar? The reason I am asking is that even
>> at p<=0.01 I am getting a very high number of differentially expressed
>> probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and
>> I want to make sure this is biological significance, and not some
>> technical aspect I have missed.
>>
>> Thanks in advance.
>>
>> Best,
>> David
>>
>>
>>> sessionInfo()
>>
>> R version 2.14.1 (2011-12-22)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] statmod_1.4.16 limma_3.10.3
>>
>> loaded via a namespace (and not attached):
>> [1] tcltk_2.14.1 tools_2.14.1
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}