[BioC] limma - complex design matrix / blocks / random effects
Naomi Altman
naomi at stat.psu.edu
Wed Jun 4 05:44:45 CEST 2008
maanova purports to handle several random effects.
--Naomi
At 05:01 PM 6/3/2008, Yannick Wurm wrote:
>Hello List,
>
>I've got a bit of a complex design - perhaps you can offer me some
>advice. Thank you very much in advance for taking the time to read
>this mail. (I've highlighted my questions with *** if you just want
>to skip down)
>
>
>
>I have ~100 two-color spotted cDNA chips, making up a 20-timepoint
>timecourse, replicated with four different strains. (so ~20
>timepoints/strain in a loop, plus a few hybs against a reference
>RNA). But a few timepoints have only three replicated strains because
>something went wrong in the lab.
>
>Until now, I've been inspired by the end of section 8.2 of
>limma_2.12.0's limmaUsersGuide(), as follows:
> ### below, my four strains are named A, D, G and H
> design <- modelMatrix(targets, ref = "REF")
> fit <- lmFit(MA, design)
> cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4
> - (A0h+D0h +G0h+K0h)/4,
> T_6h_vs0h= (A6h+D6h+G6h)/3 -
> (A0h+D0h+G0h+K0h)/4,
> T_12h_vs0h= (A12h+D12h+G12h)/3 -
> (A0h+D0h+G0h+K0h)/4,
> T_18h_vs0h= (A18h+D18h+G18h+K18h)/4
> - (A0h+D0h+G0h+K0h)/4,
> T_24h_vs0h= (A24h+D24h+G24h+K24h)/4
> - (A0h+D0h+G0h+K0h)/4,
> T_36h_vs0h= (A36h+D36h+G36h+K36h)/4
> - (A0h+D0h+G0h+K0h)/4,
> T_48h_vs0h= (A48h+D48h+G48h+K48h)/4
> - (A0h+D0h+G0h+K0h)/4,
> T_3_vs0h= (A3+D3+G3+K3)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_4_vs0h= (A4+D4+G4+K4)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4,
> T_6_vs0h= (A6+D6+G6+K6)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_7_vs0h= (A7+D7+G7+K7)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_8_vs0h= (A8+D8+G8+K8)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_9_vs0h= (A9+D9+G9+K9)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_10_vs0h= (A10+D10+G10+K10)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_11_vs0h= (A11+D11+G11+K11)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_12_vs0h= (A12+D12+G12+K12)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_13_vs0h= (A13+D13+G13+K13)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_14_vs0h= (A14+D14+G14+K14)/4 -
> (A0h+D0h+G0h+K0h)/4,
> T_15_vs0h= (A15+D15+G15+K15)/4 -
> (A0h+D0h+G0h+K0h)/4,
> levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
>
>Most RNA samples are used twice (eg: the "3hour" sample from strain A
>is used for both: A0h->A3h and: A3h->A6h), but some are used three
>times (this would be the case for A3h if I also had a A3h->REF chip).
>
>Limma is unable to estimate coefficients for many of my spots, ie
>fit2 $coefficients give "NA" rows for too many spots. If I understand
>correctly, this could be because for example both "K0h" chips are
>have 5000 flagged spots in common (indeed, several chips have
>thousands of genepix auto-flagged weak spots).
>
>NAs are frustrating. One workaround that came up when consulting my
>local statistician: to treat technical replicates and biological
>replicates equally. But technical replicates have higher correlation
>than biological replicates (I calculated a few correlations by hand):
> - technical replicate (same biological sample, same dye, different
>chip): median correlation = 0.55
> - biological replicate (different sample, but same timepoint and
>dye, different chip): median correlation = 0.35
>Thus I think it would be incorrect to treat my technical replicates
>as biological replicates.
>
>So I'm examining alternative means of creating a design matrix.
>
>Ideally, I'd like to use "strain" as a random effect here. *** Is it
>correct that the only way to add a random effect is with lmFit's
>block argument?:
> lmFit(myMA, design=myDesign, block=factor(targets$Colony),
>correlation=0.55)
>duplicateCorrelation cannot be used on a complex design. *** Is it
>correct to use the block argument on such a design, with a
>correlation I calculated "by hand"? ***
>
>Alternatively, I'll just put everything as fixed factors, but it
>feels wrong:
> extraFactors =
> model.matrix(~0+factor(targets$Strain)+factor(targets $Batch)) #
> several strains, only two possible batches
> colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch")
> design_bothFixed = cbind(modelMatrix(targets, ref="REF"),
> extraFactors)
>*** Would the design matrix I just created be a correct way of
>proceeding?
>
>***Since limma is limited to a single random factor, how should I
>choose which between Strain and Batch to use as fixed and which as
>random?
>
>Thanks very much :o)
>
>Yannick
>
>
>--------------------------------------------
> yannick . wurm @ unil . ch
>Ant Genomics, Ecology & Evolution @ Lausanne
> http://www.unil.ch/dee/page28685_fr.html
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list