[BioC] limma: correcting for a correlation between contrasts

Sat Jul 13 01:00:50 CEST 2013

Dear Alex,

As a further thought, it would be worthwhile to run genas() on your data:

   genas(fit3, plot=TRUE)

This is because the experimental design is such that logFC.T2 and logFC.T1 
would show a correlation of about 0.5 even if neither T2 and T1 had any 
true differential expression.  This is because both of the contrasts are 
relative to the same control, a fact that induces a technical correlation 
in the observed fold changes.

The genas() function is able to deconvolve the technical and true 
biological components of the correlation between two contrasts.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Fri, 12 Jul 2013, Gordon K Smyth wrote:

> Dear Alex,
>
> On Thu, 11 Jul 2013, Alex Gutteridge wrote:
>
>> Yes, of course you're right T2-T1 is the simpler (correct) way to express 
>> that contrast. Muddled thinking on my part.
>> 
>> Your second question gets to the heart of my problem: I know that wherever 
>> I set the thresholds for determining DE there will be genes that are DE for 
>> T2 - Ctrl, but not for T1 - Ctrl, but where if the magnitude of the T1 
>> response were simply a little stronger they *would* be DE.
>
> The strong way to get genes specific to T2 is to use three conditions. Find 
> that genes that are:
>
>  DE for T2 - Ctrl,
>  Not DE for T1- Ctrl, and
>  DE for T2-T1
>
> This ensures that all the chosen genes are clearly specific.  It chooses only 
> genes that have clearly larger responses to T2 than to T1.  The theshold for 
> DE can be chosen fairly loose for the third condition.
>
> This is of course a general procedure, and does not take account of any 
> assumed linear relationship between T2 and T1.
>
>> Put another way, if we assume a linear dose response for the system to T1, 
>> which other data suggests is not an unreasonable assumption here, an 
>> increase in the T1 concentration would, for many, but not all, genes 
>> produce the same response as T2 (or vice-versa a decrease in T2 
>> concentration would produce the same response as T1). I want to get those 
>> genes where this is *not* true.
>
> Well, your plot of logFC for T2 vs logFC for T1 is already a good start. You 
> could a least squares regression of T2 on T1 through zero:
>
>  logFC.T1 <- fit3$coefficients[,1]
>  logFC.T2 <- fit3$coefficients[,2]
>  fit <- lm(logFC.T2 ~ 0 + logFC.T1)
>
> It is important to force the regression through zero.  Suppose the slope 
> coef(fit) is 0.7, for example.  Then the contrast you probably want to test 
> is:
>
>  cont <- makeContrasts(T2-C - 0.7*(T1-C), levels=design)
>  fit2 <- contrasts.fit(fit,cont)
>  fit2 <- eBayes(fit2)
>
> This will find genes that don't fit the linear regression line relating the 
> T2 response to the T1 response.
>
> Best wishes
> Gordon
>
>> Put one more (final!) way: Imagine T1 and T2 are actually the same compound 
>> but applied at 1uM and 2uM concentrations and that there is linear dose 
>> response to the concentration of this compound for all genes. Looking for 
>> DE will produce lots of genes between those two conditions, but 
>> fundamentally it is the same response. I want to correct for the difference 
>> in magnitude and so would hope for *no* reported DE in this (imaginary!) 
>> experiment.
>> 
>> Rather than go back and tell the experimentalists to tweak their T1/T2 
>> concentrations until the responses are equivalent, I feel there must be a 
>> smart way (using some assumptions for sure) to model this in. One simple 
>> approach that had occurred to me would be to just multiply all the T1 
>> logFCs by a common factor that produces a T1 response that *overall* is 
>> equivalent to the T2 response. I can get that factor by simple linear 
>> regression of the two responses, but I'm not sure where in the limma 
>> workflow it makes sense (if at all) to try to shove that correction in.
>> 
>> This test code and simulated data seems to do what I want (return 0 DE 
>> genes in the final analysis). Is it valid to do this?
>> 
>> ctrl     = rnorm(100,10,1)
>> response = rnorm(100,0,1)
>> 
>> t1   = ctrl + response
>> t2   = ctrl + response * 2
>> 
>> exprs = matrix(c(ctrl,ctrl+rnorm(100,0,0.1),ctrl+rnorm(100,0,0.1),
>>                 t1,  t1+rnorm(100,0,0.1),  t1+rnorm(100,0,0.1),
>>                 t2,  t2+rnorm(100,0,0.1),  t2+rnorm(100,0,0.1)),ncol=9)
>> 
>> groups = factor(c(rep("C",3),rep("T1",3),rep("T2",3)))
>> 
>> design = model.matrix(~0+groups)
>> colnames(design) = levels(groups)
>> 
>> contrasts = makeContrasts(T1 - C, T2 - C, levels=design)
>> 
>> fit = lmFit(exprs,design)
>> fit2 = contrasts.fit(fit,contrasts)
>> fit3 = eBayes(fit2)
>> 
>> model = lm(fit3$coefficients[,2] ~ fit3$coefficients[,1])
>> 
>> contrasts = makeContrasts(T1 - C, T2 - C, (T1 - C)*model$coefficients[2] - 
>> (T2 - C), levels=design)
>> 
>> fit = lmFit(exprs,design)
>> fit2 = contrasts.fit(fit,contrasts)
>> fit3 = eBayes(fit2)
>> 
>> topTable(fit3, coef=4)
>> 
>> On 11.07.2013 12:54, Gordon K Smyth wrote:
>>> Dear Alex,
>>> 
>>> The contrast (Ctrl-T1)-(Ctrl-T2) is just equal to T2-T1, it finds
>>> genes for which T2 and T1 are not identical.
>>> 
>>> If you want T2 specific genes, would you not simply look for genes
>>> that are DE for T2 - Ctrl but not for T1 - Ctrl?
>>> 
>>> Best wishes
>>> Gordon
>>> 
>>>> Date: Thu, 11 Jul 2013 10:18:16 +0100
>>>> From: Alex Gutteridge <alexg at ruggedtextile.com>
>>>> To: <bioconductor at r-project.org>
>>>> Subject: [BioC] limma: correcting for a correlation between contrasts
>>>> 
>>>> I have an experimental design (or rather question) that goes beyond my
>>>> usual limma knowledge. I am hoping there is some elegant way to answer
>>>> it that someone here can point me towards. The experimental design is
>>>> straightforward: one control group (Ctrl) and two treatment groups (T1 &
>>>> T2) with three replicates each:
>>>> 
>>>> S1  Ctrl
>>>> S2  Ctrl
>>>> S3  Ctrl
>>>> S4  T1
>>>> S5  T1
>>>> S6  T1
>>>> S7  T2
>>>> S8  T2
>>>> S9  T2
>>>> 
>>>> It is clear from some basic exploratory analysis that the effect of T2
>>>> is very similar to T1, but has a slightly reduced response. I.e.
>>>> plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - T2
>>>> contrast gives a clear linear correlation but with slope <1. This is
>>>> expected given what we know about the biology of the system, but we
>>>> suspect there are some genes that will be specific for the T2 treatment.
>>>> My first instinct is to setup a contrast for limma like: (Ctrl - T1) -
>>>> (Ctrl - T2), but because the T2 response is across the board less than
>>>> T1 this ends up just highlighting that fact rather than pulling out the
>>>> T2 'specific' response.
>>>> 
>>>> Hopefully this code explains things visually with some simulated data:
>>>> 
>>>> T1.response = rnorm(100,0,1)
>>>> T2.response = T1.response * 0.5 + rnorm(100,0,0.2)
>>>> T1.response[101:110] = rnorm(10,0,0.2)
>>>> T2.response[101:110] = rnorm(10,1,0.2)
>>>> 
>>>> 
>>>> plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("green",100),rep("black",10)),pch=16)
>>>> 
>>>> abline(0,1,col="blue")
>>>> abline(-0.5,1,lty=2,col="blue")
>>>> abline(0.5,1,lty=2,col="blue")
>>>> 
>>>> abline(model,col="red")
>>>> 
>>>> abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red")
>>>> 
>>>> abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red")
>>>> 
>>>> The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2
>>>> specific genes (black) which I want, but also those genes outside the
>>>> blue tramlines that are simply not responding as strongly to T2 as T1
>>>> (which I don't want). In the real data, those genes out-number the
>>>> really T2 specific genes considerably and dominate in any downstream
>>>> GSEA type analysis. Is there a way to setup the contrast instead so
>>>> limma corrects for the correlation between the two datasets and looks
>>>> for genes with large residuals from the red (fitted) line rather than
>>>> the blue?
>>>> 
>>>> -- Alex Gutteridge
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}