[BioC] limma - complex design matrix / blocks / random effects

Tue Jun 3 23:01:00 CEST 2008

Hello List,

I've got a bit of a complex design - perhaps you can offer me some  
advice. Thank you very much in advance for taking the time to read  
this mail. (I've highlighted my questions with *** if you just want  
to skip down)

I have ~100 two-color spotted cDNA chips, making up a 20-timepoint  
timecourse, replicated with four different strains. (so ~20  
timepoints/strain in a loop, plus a few hybs against a reference  
RNA). But a few timepoints have only three replicated strains because  
something went wrong in the lab.

Until now, I've been inspired by the end of section 8.2 of  
limma_2.12.0's limmaUsersGuide(), as follows:
	### below, my four strains are named A, D, G and H
	design <- modelMatrix(targets, ref = "REF")
	fit <- lmFit(MA, design)
	cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 - (A0h+D0h 
+G0h+K0h)/4,
				T_6h_vs0h= (A6h+D6h+G6h)/3 - (A0h+D0h+G0h+K0h)/4,
				T_12h_vs0h= (A12h+D12h+G12h)/3 - (A0h+D0h+G0h+K0h)/4,
				T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 - (A0h+D0h+G0h+K0h)/4,
				T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 - (A0h+D0h+G0h+K0h)/4,
				T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 - (A0h+D0h+G0h+K0h)/4,
				T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 - (A0h+D0h+G0h+K0h)/4,
				T_3_vs0h= (A3+D3+G3+K3)/4 - (A0h+D0h+G0h+K0h)/4,
				T_4_vs0h= (A4+D4+G4+K4)/4 - (A0h+D0h+G0h+K0h)/4,
				T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4,
				T_6_vs0h= (A6+D6+G6+K6)/4 - (A0h+D0h+G0h+K0h)/4,
				T_7_vs0h= (A7+D7+G7+K7)/4 - (A0h+D0h+G0h+K0h)/4,
				T_8_vs0h= (A8+D8+G8+K8)/4 - (A0h+D0h+G0h+K0h)/4,
				T_9_vs0h= (A9+D9+G9+K9)/4 - (A0h+D0h+G0h+K0h)/4,
				T_10_vs0h= (A10+D10+G10+K10)/4 - (A0h+D0h+G0h+K0h)/4,
				T_11_vs0h= (A11+D11+G11+K11)/4 - (A0h+D0h+G0h+K0h)/4,
				T_12_vs0h= (A12+D12+G12+K12)/4 - (A0h+D0h+G0h+K0h)/4,
				T_13_vs0h= (A13+D13+G13+K13)/4 - (A0h+D0h+G0h+K0h)/4,
				T_14_vs0h= (A14+D14+G14+K14)/4 - (A0h+D0h+G0h+K0h)/4,
				T_15_vs0h= (A15+D15+G15+K15)/4 - (A0h+D0h+G0h+K0h)/4,
				levels=design)
	fit2 <- contrasts.fit(fit, cont.matrix)

Most RNA samples are used twice (eg: the "3hour" sample from strain A  
is used for both: A0h->A3h and: A3h->A6h), but some are used three  
times (this would be the case for A3h if I also had a A3h->REF chip).

Limma is unable to estimate coefficients for many of my spots, ie fit2 
$coefficients give "NA" rows for too many spots. If I understand  
correctly, this could be because for example both "K0h" chips are  
have 5000 flagged spots in common (indeed, several chips have  
thousands of genepix auto-flagged weak spots).

NAs are frustrating. One workaround that came up when consulting my  
local statistician: to treat technical replicates and biological  
replicates equally. But technical replicates have higher correlation  
than biological replicates (I calculated a few correlations by hand):
	- technical replicate (same biological sample, same dye, different  
chip): median correlation = 0.55
	- biological replicate (different sample, but same timepoint and  
dye, different chip): median correlation = 0.35
Thus I think it would be incorrect to treat my technical replicates  
as biological replicates.

So I'm examining alternative means of creating a design matrix.

Ideally, I'd like to use "strain" as a random effect here. *** Is it  
correct that the only way to add a random effect is with lmFit's  
block argument?:
	lmFit(myMA, design=myDesign, block=factor(targets$Colony),  
correlation=0.55)
duplicateCorrelation cannot be used on a complex design. *** Is it  
correct to use the block argument on such a design, with a  
correlation I calculated "by hand"? ***

Alternatively, I'll just put everything as fixed factors, but it  
feels wrong:
	extraFactors = model.matrix(~0+factor(targets$Strain)+factor(targets 
$Batch))  # several strains, only two possible batches
	colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch")
	design_bothFixed = cbind(modelMatrix(targets, ref="REF"), extraFactors)
*** Would the design matrix I just created be a correct way of  
proceeding?

***Since limma is limited to a single random factor, how should I  
choose which between Strain and Batch to use as fixed and which as  
random?

Thanks very much :o)

Yannick

--------------------------------------------
          yannick . wurm @ unil . ch
Ant Genomics, Ecology & Evolution @ Lausanne
   http://www.unil.ch/dee/page28685_fr.html