[BioC] limma - complex design matrix / blocks / random effects
Yannick Wurm
Yannick.Wurm at unil.ch
Tue Jun 3 23:01:00 CEST 2008
Hello List,
I've got a bit of a complex design - perhaps you can offer me some
advice. Thank you very much in advance for taking the time to read
this mail. (I've highlighted my questions with *** if you just want
to skip down)
I have ~100 two-color spotted cDNA chips, making up a 20-timepoint
timecourse, replicated with four different strains. (so ~20
timepoints/strain in a loop, plus a few hybs against a reference
RNA). But a few timepoints have only three replicated strains because
something went wrong in the lab.
Until now, I've been inspired by the end of section 8.2 of
limma_2.12.0's limmaUsersGuide(), as follows:
### below, my four strains are named A, D, G and H
design <- modelMatrix(targets, ref = "REF")
fit <- lmFit(MA, design)
cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 - (A0h+D0h
+G0h+K0h)/4,
T_6h_vs0h= (A6h+D6h+G6h)/3 - (A0h+D0h+G0h+K0h)/4,
T_12h_vs0h= (A12h+D12h+G12h)/3 - (A0h+D0h+G0h+K0h)/4,
T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 - (A0h+D0h+G0h+K0h)/4,
T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 - (A0h+D0h+G0h+K0h)/4,
T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 - (A0h+D0h+G0h+K0h)/4,
T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 - (A0h+D0h+G0h+K0h)/4,
T_3_vs0h= (A3+D3+G3+K3)/4 - (A0h+D0h+G0h+K0h)/4,
T_4_vs0h= (A4+D4+G4+K4)/4 - (A0h+D0h+G0h+K0h)/4,
T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4,
T_6_vs0h= (A6+D6+G6+K6)/4 - (A0h+D0h+G0h+K0h)/4,
T_7_vs0h= (A7+D7+G7+K7)/4 - (A0h+D0h+G0h+K0h)/4,
T_8_vs0h= (A8+D8+G8+K8)/4 - (A0h+D0h+G0h+K0h)/4,
T_9_vs0h= (A9+D9+G9+K9)/4 - (A0h+D0h+G0h+K0h)/4,
T_10_vs0h= (A10+D10+G10+K10)/4 - (A0h+D0h+G0h+K0h)/4,
T_11_vs0h= (A11+D11+G11+K11)/4 - (A0h+D0h+G0h+K0h)/4,
T_12_vs0h= (A12+D12+G12+K12)/4 - (A0h+D0h+G0h+K0h)/4,
T_13_vs0h= (A13+D13+G13+K13)/4 - (A0h+D0h+G0h+K0h)/4,
T_14_vs0h= (A14+D14+G14+K14)/4 - (A0h+D0h+G0h+K0h)/4,
T_15_vs0h= (A15+D15+G15+K15)/4 - (A0h+D0h+G0h+K0h)/4,
levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
Most RNA samples are used twice (eg: the "3hour" sample from strain A
is used for both: A0h->A3h and: A3h->A6h), but some are used three
times (this would be the case for A3h if I also had a A3h->REF chip).
Limma is unable to estimate coefficients for many of my spots, ie fit2
$coefficients give "NA" rows for too many spots. If I understand
correctly, this could be because for example both "K0h" chips are
have 5000 flagged spots in common (indeed, several chips have
thousands of genepix auto-flagged weak spots).
NAs are frustrating. One workaround that came up when consulting my
local statistician: to treat technical replicates and biological
replicates equally. But technical replicates have higher correlation
than biological replicates (I calculated a few correlations by hand):
- technical replicate (same biological sample, same dye, different
chip): median correlation = 0.55
- biological replicate (different sample, but same timepoint and
dye, different chip): median correlation = 0.35
Thus I think it would be incorrect to treat my technical replicates
as biological replicates.
So I'm examining alternative means of creating a design matrix.
Ideally, I'd like to use "strain" as a random effect here. *** Is it
correct that the only way to add a random effect is with lmFit's
block argument?:
lmFit(myMA, design=myDesign, block=factor(targets$Colony),
correlation=0.55)
duplicateCorrelation cannot be used on a complex design. *** Is it
correct to use the block argument on such a design, with a
correlation I calculated "by hand"? ***
Alternatively, I'll just put everything as fixed factors, but it
feels wrong:
extraFactors = model.matrix(~0+factor(targets$Strain)+factor(targets
$Batch)) # several strains, only two possible batches
colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch")
design_bothFixed = cbind(modelMatrix(targets, ref="REF"), extraFactors)
*** Would the design matrix I just created be a correct way of
proceeding?
***Since limma is limited to a single random factor, how should I
choose which between Strain and Batch to use as fixed and which as
random?
Thanks very much :o)
Yannick
--------------------------------------------
yannick . wurm @ unil . ch
Ant Genomics, Ecology & Evolution @ Lausanne
http://www.unil.ch/dee/page28685_fr.html
More information about the Bioconductor
mailing list