[BioC] Experimental Design, Model Matrix and Contrasts
Nathan.Watson-Haigh at csiro.au
Nathan.Watson-Haigh at csiro.au
Thu Jul 17 01:59:17 CEST 2008
I'm getting myself confused with setting up these things for a limma
analysis. Please stop me and correct me if I'm wrong with any of this!
I have two factors each with two levels:
Treatment - "C" and "T"
Gestation - "d60" and "d67"
I setup up my model and design matrix as follows:
TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, sep=".")
TG<-factor(TG,levels=unique(TG))
design<-model.matrix(~0+TG)
colnames(design)<-levels(TG)
which gave the following design matrix:
C.d60 T.d60 C.d67 T.d67
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 0 1 0 0
6 0 1 0 0
7 0 1 0 0
8 0 1 0 0
9 0 1 0 0
10 0 0 1 0
11 0 0 1 0
12 0 0 1 0
13 0 0 1 0
14 0 0 0 1
15 0 0 0 1
16 0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$TG
[1] "contr.treatment"
So I can use contrasts to pull out comparisons of Control Day 60 (C.d60)
versus Treatment Day 60 (T.d60), Control Day 67 (C.d67) versus Treatment
Day 67 (T.d67), the treatment main effect as well as the interaction
terms using the following:
cont.matrix<-makeContrasts(
Treatment_d60 = T.d60 - C.d60,
Treatment_d67 = T.d67 - C.d67,
Treatment = (T.d60+T.d67) - (C.d60+C.d67),
Diff = (T.d67-C.d67) - (T.d60-C.d60),
levels=design
)
Which gives the following contrast matrix:
Contrasts
Levels Treatment_d60 Treatment_d67 Treatment Diff
C.d60 -1 0 -1 1
T.d60 1 0 1 -1
C.d67 0 -1 -1 -1
T.d67 0 1 1 1
I've now realised that there is contamination in some of my samples and
would like to include this in the model so I can try to remove it's
effect and try to increase the power to detect differentially expressed
genes for the contrasts show above. However I'm not clear on how to
proceed. I don't think the above approach can be extended to include the
contamination effect. So I though something like this might work/be
correct:
Treatment<-factor(targets$Treatment, levels=unique(targets$Treatment))
Gestation<-factor(targets$Gestation, levels=unique(targets$Gestation))
Contamination<-factor(targets$Contamination,
levels=unique(targets$Contamination))
design<-model.matrix(~0+Treatment*Gestation+Contamination)
Which give the following design matrix:
TreatmentC TreatmentT Gestationd67 Contamination1
TreatmentT:Gestationd67
1 1 0 0 0
0
2 1 0 0 1
0
3 1 0 0 0
0
4 1 0 0 1
0
5 0 1 0 0
0
6 0 1 0 1
0
7 0 1 0 1
0
8 0 1 0 1
0
9 0 1 0 0
0
10 1 0 1 0
0
11 1 0 1 0
0
12 1 0 1 0
0
13 1 0 1 1
0
14 0 1 1 0
1
15 0 1 1 0
1
16 0 1 1 0
1
attr(,"assign")
[1] 1 1 2 3 4
attr(,"contrasts")
attr(,"contrasts")$Treatment
[1] "contr.treatment"
attr(,"contrasts")$Gestation
[1] "contr.treatment"
attr(,"contrasts")$Contamination
[1] "contr.treatment"
I now have a few questions:
1) When specifying the model, what's the difference between using
~Treatment*Gestation and ~0+Treatment*Gestation and how does the alter
the interpretation of the design matrix?
2) Using the above design matrix (assuming it is correct/the best way to
go forward), how would I get the Treatment effect, Gestation effect,
interaction, as well as the treatment effect for d60 Gestation and the
treatment effect for d67?
3) Out of interest, how would I find out the genes affected by the
contamination?
I'm somewhat confused, so if I'm making any serious errors, please let
me know!
Nathan
-------------------------------------------------------------
Dr. Nathan S. Watson-Haigh (publish under Haigh, N.S.)
OCE Post Doctoral Fellow
CSIRO Livestock Industries
J M Rendel Laboratory
Rockhampton
QLD 4701 Tel: +61 (0)7 4923 8121
Australia Fax: +61 (0)7 4923 8222
More information about the Bioconductor
mailing list