[BioC] How to average over duplicate spots [Was: Re: technical replicates and spots in limma]

Fri Apr 18 15:52:32 CEST 2008

Gordon Smyth <smyth at ...> writes:

> 
> I really hesitate to explain how to average of duplicate spots using limma, 
> because it is not something I generally recommend. It is however quite easy:
> 
> Start with a normalized MAList object, 'MA', possibly containing spot 
> quality weights. Suppose that there are 'ndups' duplicates at spacing 
> 'spacing'. You can average over duplicates by
> 
> fit1 <- lmFit(MA, design=diag(ncol(MA)), ndups=ndups, spacing=spacing, 
> correlation=0)
> 
> Now the averaged log-ratios are in
> 
> Y <- fit1$coef
> 
> and the consolidated spot quality weights are in
> 
> w <- 1/fit1$stdev.unscaled^2
> 
> Now you can fit any model you like to the averaged log-ratios, e.g.,
> 
> fit <- lmFit(Y, design, weights=w)
> etc

Hi Gordon, thanks for the tip on how to easily average duplicates.  I have 2.x
follow up questions:

A) I know averaging is not ideal given we throw away the information on gene
wise variance that could otherwise add to our ability to detect differential
expression.  

As stated in help(duplicateCorrelation), for the results to have validity,
there needs to be at least 2 more arrays than the number of coefficients to be
estimated (ie at least 2 replicates).  If I were doing a 13 time point series
of arrays, is it necessary to replicate all of them or could I select just a
few time points of interest?  Also, better to do bio or tech replicates?

B) Secondly, I used your code (written above) to average the spots, but when I
tried to calculate standard errors using eBayes I get the following error 
message:

Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim =
stdev.coef.lim) : 
  No residual degrees of freedom in linear model fits

I guess I don't understand why df.residual is 0 for all of my genes and would
like to know what I can do to remedy this.  Perhaps I can't use eBayes?  But I
also thought that eBayes was separate and independent of duplicateCorrelation
for adding degrees of freedom to SE estimates?

My experiment is a 13 time point series of arrays in control (LB) and treatment
(AKI)  conditions.  

The design matrix (abbreviated) is:

	AKI100   AKI125...AKI600  LB100   LB125...LB600
[1,]	1        0        0       0       0       0
[2,]	0        1        0       0       0       0
 .      .        .        .       .       .       .
 .      .        .        .       .       .       .
 .      .        .        .       .       .       .
[26,]   0        0        0       0       0       1

And the rest of the code:
isGene <- (MAsort$genes$Status %in% "mRNA")
fit <- lmFit(MA.norm[isGene, ], design=design, ndups=3, spacing=1, correlation=0)
Ave <- fit$coef
w <- 1/fit$stdev.unscaled^2
fit1 <- lmFit(Ave, design=design, weights=w) 
cont.dif <- makeContrasts(fit125 = (AKI125 - LB125) - (AKI100 - LB100),
fit150 = (AKI150 - LB150) - (AKI100 - LB100),
fit215 = (AKI215 - LB215) - (AKI100 - LB100),
fit240 = (AKI240 - LB240) - (AKI100 - LB100),
fit305 = (AKI305 - LB305) - (AKI100 - LB100), 
fit330 = (AKI330 - LB330) - (AKI100 - LB100), 
fit355 = (AKI355 - LB355) - (AKI100 - LB100), 
fit420 = (AKI420 - LB420) - (AKI100 - LB100), 
fit445 = (AKI445 - LB445) - (AKI100 - LB100), 
fit510 = (AKI510 - LB510) - (AKI100 - LB100), 
fit535 = (AKI535 - LB535) - (AKI100 - LB100), 
fit600 = (AKI600 - LB600) - (AKI100 - LB100), levels=design)
fit2 <- contrasts.fit(fit1, cont.dif)
fit2 <- eBayes(fit2)

> sessionInfo()
R version 2.6.2 (2008-02-08) 
i386-apple-darwin8.10.1 

locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] statmod_1.3.3 limma_2.12.0 

Thank you!
Sanjat