[BioC] limma: correcting for a correlation between contrasts

Thu Jul 11 11:18:16 CEST 2013

I have an experimental design (or rather question) that goes beyond my 
usual limma knowledge. I am hoping there is some elegant way to answer 
it that someone here can point me towards. The experimental design is 
straightforward: one control group (Ctrl) and two treatment groups (T1 & 
T2) with three replicates each:

S1  Ctrl
S2  Ctrl
S3  Ctrl
S4  T1
S5  T1
S6  T1
S7  T2
S8  T2
S9  T2

It is clear from some basic exploratory analysis that the effect of T2 
is very similar to T1, but has a slightly reduced response. I.e. 
plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - T2 
contrast gives a clear linear correlation but with slope <1. This is 
expected given what we know about the biology of the system, but we 
suspect there are some genes that will be specific for the T2 treatment. 
My first instinct is to setup a contrast for limma like: (Ctrl - T1) - 
(Ctrl - T2), but because the T2 response is across the board less than 
T1 this ends up just highlighting that fact rather than pulling out the 
T2 'specific' response.

Hopefully this code explains things visually with some simulated data:

T1.response = rnorm(100,0,1)
T2.response = T1.response * 0.5 + rnorm(100,0,0.2)
T1.response[101:110] = rnorm(10,0,0.2)
T2.response[101:110] = rnorm(10,1,0.2)

plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("green",100),rep("black",10)),pch=16)

abline(0,1,col="blue")
abline(-0.5,1,lty=2,col="blue")
abline(0.5,1,lty=2,col="blue")

abline(model,col="red")
abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red")
abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red")

The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2 
specific genes (black) which I want, but also those genes outside the 
blue tramlines that are simply not responding as strongly to T2 as T1 
(which I don't want). In the real data, those genes out-number the 
really T2 specific genes considerably and dominate in any downstream 
GSEA type analysis. Is there a way to setup the contrast instead so 
limma corrects for the correlation between the two datasets and looks 
for genes with large residuals from the red (fitted) line rather than 
the blue?

-- 
Alex Gutteridge