[BioC] Experimental Design, Model Matrix and Contrasts

Nathan.Watson-Haigh at csiro.au Nathan.Watson-Haigh at csiro.au
Mon Jul 21 02:54:36 CEST 2008


If I do as you say, I have three factors with two levels each:
Treatment - "C" and "T"
Gestation - "d60" and "d67"
Contamination - 0 and 1

I set up the design matrix as follows:
TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation,
pData(eSet)$Muscle_Contamination, sep=".")
TG<-factor(TG,levels=unique(TG))
design<-model.matrix(~0+TG)
colnames(design)<-levels(TG)

Which gives the following design matrix:
   C.d60.0 C.d60.1 T.d60.0 T.d60.1 C.d67.0 C.d67.1 T.d67.0
1        1       0       0       0       0       0       0
2        0       1       0       0       0       0       0
3        1       0       0       0       0       0       0
4        0       1       0       0       0       0       0
5        0       0       1       0       0       0       0
6        0       0       0       1       0       0       0
7        0       0       0       1       0       0       0
8        0       0       0       1       0       0       0
9        0       0       1       0       0       0       0
10       0       0       0       0       1       0       0
11       0       0       0       0       1       0       0
12       0       0       0       0       1       0       0
13       0       0       0       0       0       1       0
14       0       0       0       0       0       0       1
15       0       0       0       0       0       0       1
16       0       0       0       0       0       0       1
attr(,"assign")
[1] 1 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$TG
[1] "contr.treatment"

For a contrast of C.d60 vs T.d60, I then need to group together C.d60.0
and C.d60.1 don't I? Doing this I have a contrast marix like this:
cont.matrix<-makeContrasts(
	Treatment_d60 = (T.d60.0+T.d60.1) - (C.d60.0+C.d60.1),
	Treatment_d67 = (T.d67.0) - (C.d67.0+C.d67.1),
	Treatment     = (T.d60.0+T.d60.1+T.d67.0) -
(C.d60.0+C.d60.1+C.d67.0+C.d67.1),
	Diff          = ((T.d67.0) - (C.d67.0+C.d67.1)) -
((T.d60.0+T.d60.1) - (C.d60.0+C.d60.1)),
	Contamination = (T.d60.1+C.d60.1+C.d67.1) -
(T.d60.0+C.d60.0+T.d67.0+C.d67.0),
	levels=design
)

Which is:
         Contrasts
Levels    Treatment_d60 Treatment_d67 Treatment Diff Contamination
  C.d60.0            -1             0        -1    1            -1
  C.d60.1            -1             0        -1    1             1
  T.d60.0             1             0         1   -1            -1
  T.d60.1             1             0         1   -1             1
  C.d67.0             0            -1        -1   -1            -1
  C.d67.1             0            -1        -1   -1             1
  T.d67.0             0             1         1    1            -1


First of all does this look like it should be doing what I want i.e.
removing the contamination effect? Or do in order to get the
Treatment_d60 effect, do I need to "remove" the contamination effect
explicitly? E.g. a contast like this:
Treatment_d60 = (T.d60.0+T.d60.1) - (C.d60.0+C.d60.1) -
((T.d60.1+C.d60.1+C.d67.1) - (T.d60.0+C.d60.0+T.d67.0+C.d67.0))

Cheers,
Nathan


-----Original Message-----
From: Mark Cowley [mailto:m.cowley0 at gmail.com] 
Sent: Monday, 21 July 2008 8:55 AM
To: Watson-Haigh, Nathan (LI, Rock. Rendel)
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Experimental Design, Model Matrix and Contrasts

Hi Nathan,
How about adding the contamination factor to your original design, do  
the first lmFit, then then fit exactly the same contrasts as you have  
specified. The effect of contamination will be taken care of in the  
linear model, then you are left with 'cleaner' estimates of the C/T  
d60/67 parameters, which your contrasts can pick out.

cheers,
Mark

On 17/07/2008, at 9:59 AM, <Nathan.Watson-Haigh at csiro.au>
<Nathan.Watson-Haigh at csiro.au 
 > wrote:

> I'm getting myself confused with setting up these things for a limma
> analysis. Please stop me and correct me if I'm wrong with any of this!
>
> I have two factors each with two levels:
> Treatment - "C" and "T"
> Gestation - "d60" and "d67"
>
> I setup up my model and design matrix as follows:
>
> TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, sep=".")
> TG<-factor(TG,levels=unique(TG))
> design<-model.matrix(~0+TG)
> colnames(design)<-levels(TG)
>
> which gave the following design matrix:
>
> C.d60 T.d60 C.d67 T.d67
> 1 1 0 0 0
> 2 1 0 0 0
> 3 1 0 0 0
> 4 1 0 0 0
> 5 0 1 0 0
> 6 0 1 0 0
> 7 0 1 0 0
> 8 0 1 0 0
> 9 0 1 0 0
> 10 0 0 1 0
> 11 0 0 1 0
> 12 0 0 1 0
> 13 0 0 1 0
> 14 0 0 0 1
> 15 0 0 0 1
> 16 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$TG
> [1] "contr.treatment"
>
> So I can use contrasts to pull out comparisons of Control Day 60  
> (C.d60)
> versus Treatment Day 60 (T.d60), Control Day 67 (C.d67) versus  
> Treatment
> Day 67 (T.d67), the treatment main effect as well as the interaction
> terms using the following:
>
> cont.matrix<-makeContrasts(
> Treatment_d60 = T.d60 - C.d60,
> Treatment_d67 = T.d67 - C.d67,
> Treatment = (T.d60+T.d67) - (C.d60+C.d67),
> Diff = (T.d67-C.d67) - (T.d60-C.d60),
> levels=design
> )
>
> Which gives the following contrast matrix:
>
> Contrasts
> Levels Treatment_d60 Treatment_d67 Treatment Diff
> C.d60 -1 0 -1 1
> T.d60 1 0 1 -1
> C.d67 0 -1 -1 -1
> T.d67 0 1 1 1
>
> I've now realised that there is contamination in some of my samples  
> and
> would like to include this in the model so I can try to remove it's
> effect and try to increase the power to detect differentially  
> expressed
> genes for the contrasts show above. However I'm not clear on how to
> proceed. I don't think the above approach can be extended to include  
> the
> contamination effect. So I though something like this might work/be
> correct:
>
> Treatment<-factor(targets$Treatment, levels=unique(targets$Treatment))
> Gestation<-factor(targets$Gestation, levels=unique(targets$Gestation))
> Contamination<-factor(targets$Contamination,
> levels=unique(targets$Contamination))
> design<-model.matrix(~0+Treatment*Gestation+Contamination)
>
> Which give the following design matrix:
>   TreatmentC TreatmentT Gestationd67 Contamination1
> TreatmentT:Gestationd67
> 1           1          0            0              0
> 0
> 2           1          0            0              1
> 0
> 3           1          0            0              0
> 0
> 4           1          0            0              1
> 0
> 5           0          1            0              0
> 0
> 6           0          1            0              1
> 0
> 7           0          1            0              1
> 0
> 8           0          1            0              1
> 0
> 9           0          1            0              0
> 0
> 10          1          0            1              0
> 0
> 11          1          0            1              0
> 0
> 12          1          0            1              0
> 0
> 13          1          0            1              1
> 0
> 14          0          1            1              0
> 1
> 15          0          1            1              0
> 1
> 16          0          1            1              0
> 1
> attr(,"assign")
> [1] 1 1 2 3 4
> attr(,"contrasts")
> attr(,"contrasts")$Treatment
> [1] "contr.treatment"
>
> attr(,"contrasts")$Gestation
> [1] "contr.treatment"
>
> attr(,"contrasts")$Contamination
> [1] "contr.treatment"
>
>
> I now have a few questions:
> 1) When specifying the model, what's the difference between using
> ~Treatment*Gestation and ~0+Treatment*Gestation and how does the alter
> the interpretation of the design matrix?
> 2) Using the above design matrix (assuming it is correct/the best  
> way to
> go forward), how would I get the Treatment effect, Gestation effect,
> interaction, as well as the treatment effect for d60 Gestation and the
> treatment effect for d67?
> 3) Out of interest, how would I find out the genes affected by the
> contamination?
>
> I'm somewhat confused, so if I'm making any serious errors, please let
> me know!
> Nathan
>
> -------------------------------------------------------------
> Dr. Nathan S. Watson-Haigh        (publish under Haigh, N.S.)
> OCE Post Doctoral Fellow
> CSIRO Livestock Industries
> J M Rendel Laboratory
> Rockhampton
> QLD 4701                              Tel: +61 (0)7 4923 8121
> Australia                             Fax: +61 (0)7 4923 8222
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list