[BioC] limma: correcting for a correlation between contrasts

Gordon K Smyth smyth at wehi.EDU.AU
Fri Jul 12 01:59:52 CEST 2013


Dear Alex,

On Thu, 11 Jul 2013, Alex Gutteridge wrote:

> Yes, of course you're right T2-T1 is the simpler (correct) way to 
> express that contrast. Muddled thinking on my part.
>
> Your second question gets to the heart of my problem: I know that 
> wherever I set the thresholds for determining DE there will be genes 
> that are DE for T2 - Ctrl, but not for T1 - Ctrl, but where if the 
> magnitude of the T1 response were simply a little stronger they *would* 
> be DE.

The strong way to get genes specific to T2 is to use three conditions. 
Find that genes that are:

   DE for T2 - Ctrl,
   Not DE for T1- Ctrl, and
   DE for T2-T1

This ensures that all the chosen genes are clearly specific.  It chooses 
only genes that have clearly larger responses to T2 than to T1.  The 
theshold for DE can be chosen fairly loose for the third condition.

This is of course a general procedure, and does not take account of any 
assumed linear relationship between T2 and T1.

> Put another way, if we assume a linear dose response for the system to 
> T1, which other data suggests is not an unreasonable assumption here, an 
> increase in the T1 concentration would, for many, but not all, genes 
> produce the same response as T2 (or vice-versa a decrease in T2 
> concentration would produce the same response as T1). I want to get 
> those genes where this is *not* true.

Well, your plot of logFC for T2 vs logFC for T1 is already a good start. 
You could a least squares regression of T2 on T1 through zero:

   logFC.T1 <- fit3$coefficients[,1]
   logFC.T2 <- fit3$coefficients[,2]
   fit <- lm(logFC.T2 ~ 0 + logFC.T1)

It is important to force the regression through zero.  Suppose the slope 
coef(fit) is 0.7, for example.  Then the contrast you probably want to 
test is:

   cont <- makeContrasts(T2-C - 0.7*(T1-C), levels=design)
   fit2 <- contrasts.fit(fit,cont)
   fit2 <- eBayes(fit2)

This will find genes that don't fit the linear regression line relating 
the T2 response to the T1 response.

Best wishes
Gordon

> Put one more (final!) way: Imagine T1 and T2 are actually the same compound 
> but applied at 1uM and 2uM concentrations and that there is linear dose 
> response to the concentration of this compound for all genes. Looking for DE 
> will produce lots of genes between those two conditions, but fundamentally it 
> is the same response. I want to correct for the difference in magnitude and 
> so would hope for *no* reported DE in this (imaginary!) experiment.
>
> Rather than go back and tell the experimentalists to tweak their T1/T2 
> concentrations until the responses are equivalent, I feel there must be a 
> smart way (using some assumptions for sure) to model this in. One simple 
> approach that had occurred to me would be to just multiply all the T1 logFCs 
> by a common factor that produces a T1 response that *overall* is equivalent 
> to the T2 response. I can get that factor by simple linear regression of the 
> two responses, but I'm not sure where in the limma workflow it makes sense 
> (if at all) to try to shove that correction in.
>
> This test code and simulated data seems to do what I want (return 0 DE genes 
> in the final analysis). Is it valid to do this?
>
> ctrl     = rnorm(100,10,1)
> response = rnorm(100,0,1)
>
> t1   = ctrl + response
> t2   = ctrl + response * 2
>
> exprs = matrix(c(ctrl,ctrl+rnorm(100,0,0.1),ctrl+rnorm(100,0,0.1),
>                 t1,  t1+rnorm(100,0,0.1),  t1+rnorm(100,0,0.1),
>                 t2,  t2+rnorm(100,0,0.1),  t2+rnorm(100,0,0.1)),ncol=9)
>
> groups = factor(c(rep("C",3),rep("T1",3),rep("T2",3)))
>
> design = model.matrix(~0+groups)
> colnames(design) = levels(groups)
>
> contrasts = makeContrasts(T1 - C, T2 - C, levels=design)
>
> fit = lmFit(exprs,design)
> fit2 = contrasts.fit(fit,contrasts)
> fit3 = eBayes(fit2)
>
> model = lm(fit3$coefficients[,2] ~ fit3$coefficients[,1])
>
> contrasts = makeContrasts(T1 - C, T2 - C, (T1 - C)*model$coefficients[2] - 
> (T2 - C), levels=design)
>
> fit = lmFit(exprs,design)
> fit2 = contrasts.fit(fit,contrasts)
> fit3 = eBayes(fit2)
>
> topTable(fit3, coef=4)
>
> On 11.07.2013 12:54, Gordon K Smyth wrote:
>> Dear Alex,
>> 
>> The contrast (Ctrl-T1)-(Ctrl-T2) is just equal to T2-T1, it finds
>> genes for which T2 and T1 are not identical.
>> 
>> If you want T2 specific genes, would you not simply look for genes
>> that are DE for T2 - Ctrl but not for T1 - Ctrl?
>> 
>> Best wishes
>> Gordon
>> 
>>> Date: Thu, 11 Jul 2013 10:18:16 +0100
>>> From: Alex Gutteridge <alexg at ruggedtextile.com>
>>> To: <bioconductor at r-project.org>
>>> Subject: [BioC] limma: correcting for a correlation between contrasts
>>> 
>>> I have an experimental design (or rather question) that goes beyond my
>>> usual limma knowledge. I am hoping there is some elegant way to answer
>>> it that someone here can point me towards. The experimental design is
>>> straightforward: one control group (Ctrl) and two treatment groups (T1 &
>>> T2) with three replicates each:
>>> 
>>> S1  Ctrl
>>> S2  Ctrl
>>> S3  Ctrl
>>> S4  T1
>>> S5  T1
>>> S6  T1
>>> S7  T2
>>> S8  T2
>>> S9  T2
>>> 
>>> It is clear from some basic exploratory analysis that the effect of T2
>>> is very similar to T1, but has a slightly reduced response. I.e.
>>> plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - T2
>>> contrast gives a clear linear correlation but with slope <1. This is
>>> expected given what we know about the biology of the system, but we
>>> suspect there are some genes that will be specific for the T2 treatment.
>>> My first instinct is to setup a contrast for limma like: (Ctrl - T1) -
>>> (Ctrl - T2), but because the T2 response is across the board less than
>>> T1 this ends up just highlighting that fact rather than pulling out the
>>> T2 'specific' response.
>>> 
>>> Hopefully this code explains things visually with some simulated data:
>>> 
>>> T1.response = rnorm(100,0,1)
>>> T2.response = T1.response * 0.5 + rnorm(100,0,0.2)
>>> T1.response[101:110] = rnorm(10,0,0.2)
>>> T2.response[101:110] = rnorm(10,1,0.2)
>>> 
>>> 
>>> plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("green",100),rep("black",10)),pch=16)
>>> 
>>> abline(0,1,col="blue")
>>> abline(-0.5,1,lty=2,col="blue")
>>> abline(0.5,1,lty=2,col="blue")
>>> 
>>> abline(model,col="red")
>>> 
>>> abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red")
>>> 
>>> abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red")
>>> 
>>> The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2
>>> specific genes (black) which I want, but also those genes outside the
>>> blue tramlines that are simply not responding as strongly to T2 as T1
>>> (which I don't want). In the real data, those genes out-number the
>>> really T2 specific genes considerably and dominate in any downstream
>>> GSEA type analysis. Is there a way to setup the contrast instead so
>>> limma corrects for the correlation between the two datasets and looks
>>> for genes with large residuals from the red (fitted) line rather than
>>> the blue?
>>> 
>>> -- Alex Gutteridge

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list