[BioC] limma - complex design matrix / blocks / random effects

Wed Jun 4 05:44:45 CEST 2008

maanova purports to handle several random effects.

--Naomi

At 05:01 PM 6/3/2008, Yannick Wurm wrote:
>Hello List,
>
>I've got a bit of a complex design - perhaps you can offer me some
>advice. Thank you very much in advance for taking the time to read
>this mail. (I've highlighted my questions with *** if you just want
>to skip down)
>
>
>
>I have ~100 two-color spotted cDNA chips, making up a 20-timepoint
>timecourse, replicated with four different strains. (so ~20
>timepoints/strain in a loop, plus a few hybs against a reference
>RNA). But a few timepoints have only three replicated strains because
>something went wrong in the lab.
>
>Until now, I've been inspired by the end of section 8.2 of
>limma_2.12.0's limmaUsersGuide(), as follows:
>         ### below, my four strains are named A, D, G and H
>         design <- modelMatrix(targets, ref = "REF")
>         fit <- lmFit(MA, design)
>         cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 
> - (A0h+D0h +G0h+K0h)/4,
>                                 T_6h_vs0h= (A6h+D6h+G6h)/3 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_12h_vs0h= (A12h+D12h+G12h)/3 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 
> - (A0h+D0h+G0h+K0h)/4,
>                                 T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 
> - (A0h+D0h+G0h+K0h)/4,
>                                 T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 
> - (A0h+D0h+G0h+K0h)/4,
>                                 T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 
> - (A0h+D0h+G0h+K0h)/4,
>                                 T_3_vs0h= (A3+D3+G3+K3)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_4_vs0h= (A4+D4+G4+K4)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4,
>                                 T_6_vs0h= (A6+D6+G6+K6)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_7_vs0h= (A7+D7+G7+K7)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_8_vs0h= (A8+D8+G8+K8)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_9_vs0h= (A9+D9+G9+K9)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_10_vs0h= (A10+D10+G10+K10)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_11_vs0h= (A11+D11+G11+K11)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_12_vs0h= (A12+D12+G12+K12)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_13_vs0h= (A13+D13+G13+K13)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_14_vs0h= (A14+D14+G14+K14)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 T_15_vs0h= (A15+D15+G15+K15)/4 - 
> (A0h+D0h+G0h+K0h)/4,
>                                 levels=design)
>         fit2 <- contrasts.fit(fit, cont.matrix)
>
>Most RNA samples are used twice (eg: the "3hour" sample from strain A
>is used for both: A0h->A3h and: A3h->A6h), but some are used three
>times (this would be the case for A3h if I also had a A3h->REF chip).
>
>Limma is unable to estimate coefficients for many of my spots, ie 
>fit2 $coefficients give "NA" rows for too many spots. If I understand
>correctly, this could be because for example both "K0h" chips are
>have 5000 flagged spots in common (indeed, several chips have
>thousands of genepix auto-flagged weak spots).
>
>NAs are frustrating. One workaround that came up when consulting my
>local statistician: to treat technical replicates and biological
>replicates equally. But technical replicates have higher correlation
>than biological replicates (I calculated a few correlations by hand):
>         - technical replicate (same biological sample, same dye, different
>chip): median correlation = 0.55
>         - biological replicate (different sample, but same timepoint and
>dye, different chip): median correlation = 0.35
>Thus I think it would be incorrect to treat my technical replicates
>as biological replicates.
>
>So I'm examining alternative means of creating a design matrix.
>
>Ideally, I'd like to use "strain" as a random effect here. *** Is it
>correct that the only way to add a random effect is with lmFit's
>block argument?:
>         lmFit(myMA, design=myDesign, block=factor(targets$Colony),
>correlation=0.55)
>duplicateCorrelation cannot be used on a complex design. *** Is it
>correct to use the block argument on such a design, with a
>correlation I calculated "by hand"? ***
>
>Alternatively, I'll just put everything as fixed factors, but it
>feels wrong:
>         extraFactors = 
> model.matrix(~0+factor(targets$Strain)+factor(targets $Batch))  # 
> several strains, only two possible batches
>         colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch")
>         design_bothFixed = cbind(modelMatrix(targets, ref="REF"), 
> extraFactors)
>*** Would the design matrix I just created be a correct way of
>proceeding?
>
>***Since limma is limited to a single random factor, how should I
>choose which between Strain and Batch to use as fixed and which as
>random?
>
>Thanks very much :o)
>
>Yannick
>
>
>--------------------------------------------
>          yannick . wurm @ unil . ch
>Ant Genomics, Ecology & Evolution @ Lausanne
>   http://www.unil.ch/dee/page28685_fr.html
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111