[BioC] limma: correcting for a correlation between contrasts
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Jul 13 01:00:50 CEST 2013
Dear Alex,
As a further thought, it would be worthwhile to run genas() on your data:
genas(fit3, plot=TRUE)
This is because the experimental design is such that logFC.T2 and logFC.T1
would show a correlation of about 0.5 even if neither T2 and T1 had any
true differential expression. This is because both of the contrasts are
relative to the same control, a fact that induces a technical correlation
in the observed fold changes.
The genas() function is able to deconvolve the technical and true
biological components of the correlation between two contrasts.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Fri, 12 Jul 2013, Gordon K Smyth wrote:
> Dear Alex,
>
> On Thu, 11 Jul 2013, Alex Gutteridge wrote:
>
>> Yes, of course you're right T2-T1 is the simpler (correct) way to express
>> that contrast. Muddled thinking on my part.
>>
>> Your second question gets to the heart of my problem: I know that wherever
>> I set the thresholds for determining DE there will be genes that are DE for
>> T2 - Ctrl, but not for T1 - Ctrl, but where if the magnitude of the T1
>> response were simply a little stronger they *would* be DE.
>
> The strong way to get genes specific to T2 is to use three conditions. Find
> that genes that are:
>
> DE for T2 - Ctrl,
> Not DE for T1- Ctrl, and
> DE for T2-T1
>
> This ensures that all the chosen genes are clearly specific. It chooses only
> genes that have clearly larger responses to T2 than to T1. The theshold for
> DE can be chosen fairly loose for the third condition.
>
> This is of course a general procedure, and does not take account of any
> assumed linear relationship between T2 and T1.
>
>> Put another way, if we assume a linear dose response for the system to T1,
>> which other data suggests is not an unreasonable assumption here, an
>> increase in the T1 concentration would, for many, but not all, genes
>> produce the same response as T2 (or vice-versa a decrease in T2
>> concentration would produce the same response as T1). I want to get those
>> genes where this is *not* true.
>
> Well, your plot of logFC for T2 vs logFC for T1 is already a good start. You
> could a least squares regression of T2 on T1 through zero:
>
> logFC.T1 <- fit3$coefficients[,1]
> logFC.T2 <- fit3$coefficients[,2]
> fit <- lm(logFC.T2 ~ 0 + logFC.T1)
>
> It is important to force the regression through zero. Suppose the slope
> coef(fit) is 0.7, for example. Then the contrast you probably want to test
> is:
>
> cont <- makeContrasts(T2-C - 0.7*(T1-C), levels=design)
> fit2 <- contrasts.fit(fit,cont)
> fit2 <- eBayes(fit2)
>
> This will find genes that don't fit the linear regression line relating the
> T2 response to the T1 response.
>
> Best wishes
> Gordon
>
>> Put one more (final!) way: Imagine T1 and T2 are actually the same compound
>> but applied at 1uM and 2uM concentrations and that there is linear dose
>> response to the concentration of this compound for all genes. Looking for
>> DE will produce lots of genes between those two conditions, but
>> fundamentally it is the same response. I want to correct for the difference
>> in magnitude and so would hope for *no* reported DE in this (imaginary!)
>> experiment.
>>
>> Rather than go back and tell the experimentalists to tweak their T1/T2
>> concentrations until the responses are equivalent, I feel there must be a
>> smart way (using some assumptions for sure) to model this in. One simple
>> approach that had occurred to me would be to just multiply all the T1
>> logFCs by a common factor that produces a T1 response that *overall* is
>> equivalent to the T2 response. I can get that factor by simple linear
>> regression of the two responses, but I'm not sure where in the limma
>> workflow it makes sense (if at all) to try to shove that correction in.
>>
>> This test code and simulated data seems to do what I want (return 0 DE
>> genes in the final analysis). Is it valid to do this?
>>
>> ctrl = rnorm(100,10,1)
>> response = rnorm(100,0,1)
>>
>> t1 = ctrl + response
>> t2 = ctrl + response * 2
>>
>> exprs = matrix(c(ctrl,ctrl+rnorm(100,0,0.1),ctrl+rnorm(100,0,0.1),
>> t1, t1+rnorm(100,0,0.1), t1+rnorm(100,0,0.1),
>> t2, t2+rnorm(100,0,0.1), t2+rnorm(100,0,0.1)),ncol=9)
>>
>> groups = factor(c(rep("C",3),rep("T1",3),rep("T2",3)))
>>
>> design = model.matrix(~0+groups)
>> colnames(design) = levels(groups)
>>
>> contrasts = makeContrasts(T1 - C, T2 - C, levels=design)
>>
>> fit = lmFit(exprs,design)
>> fit2 = contrasts.fit(fit,contrasts)
>> fit3 = eBayes(fit2)
>>
>> model = lm(fit3$coefficients[,2] ~ fit3$coefficients[,1])
>>
>> contrasts = makeContrasts(T1 - C, T2 - C, (T1 - C)*model$coefficients[2] -
>> (T2 - C), levels=design)
>>
>> fit = lmFit(exprs,design)
>> fit2 = contrasts.fit(fit,contrasts)
>> fit3 = eBayes(fit2)
>>
>> topTable(fit3, coef=4)
>>
>> On 11.07.2013 12:54, Gordon K Smyth wrote:
>>> Dear Alex,
>>>
>>> The contrast (Ctrl-T1)-(Ctrl-T2) is just equal to T2-T1, it finds
>>> genes for which T2 and T1 are not identical.
>>>
>>> If you want T2 specific genes, would you not simply look for genes
>>> that are DE for T2 - Ctrl but not for T1 - Ctrl?
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> Date: Thu, 11 Jul 2013 10:18:16 +0100
>>>> From: Alex Gutteridge <alexg at ruggedtextile.com>
>>>> To: <bioconductor at r-project.org>
>>>> Subject: [BioC] limma: correcting for a correlation between contrasts
>>>>
>>>> I have an experimental design (or rather question) that goes beyond my
>>>> usual limma knowledge. I am hoping there is some elegant way to answer
>>>> it that someone here can point me towards. The experimental design is
>>>> straightforward: one control group (Ctrl) and two treatment groups (T1 &
>>>> T2) with three replicates each:
>>>>
>>>> S1 Ctrl
>>>> S2 Ctrl
>>>> S3 Ctrl
>>>> S4 T1
>>>> S5 T1
>>>> S6 T1
>>>> S7 T2
>>>> S8 T2
>>>> S9 T2
>>>>
>>>> It is clear from some basic exploratory analysis that the effect of T2
>>>> is very similar to T1, but has a slightly reduced response. I.e.
>>>> plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - T2
>>>> contrast gives a clear linear correlation but with slope <1. This is
>>>> expected given what we know about the biology of the system, but we
>>>> suspect there are some genes that will be specific for the T2 treatment.
>>>> My first instinct is to setup a contrast for limma like: (Ctrl - T1) -
>>>> (Ctrl - T2), but because the T2 response is across the board less than
>>>> T1 this ends up just highlighting that fact rather than pulling out the
>>>> T2 'specific' response.
>>>>
>>>> Hopefully this code explains things visually with some simulated data:
>>>>
>>>> T1.response = rnorm(100,0,1)
>>>> T2.response = T1.response * 0.5 + rnorm(100,0,0.2)
>>>> T1.response[101:110] = rnorm(10,0,0.2)
>>>> T2.response[101:110] = rnorm(10,1,0.2)
>>>>
>>>>
>>>> plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("green",100),rep("black",10)),pch=16)
>>>>
>>>> abline(0,1,col="blue")
>>>> abline(-0.5,1,lty=2,col="blue")
>>>> abline(0.5,1,lty=2,col="blue")
>>>>
>>>> abline(model,col="red")
>>>>
>>>> abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red")
>>>>
>>>> abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red")
>>>>
>>>> The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2
>>>> specific genes (black) which I want, but also those genes outside the
>>>> blue tramlines that are simply not responding as strongly to T2 as T1
>>>> (which I don't want). In the real data, those genes out-number the
>>>> really T2 specific genes considerably and dominate in any downstream
>>>> GSEA type analysis. Is there a way to setup the contrast instead so
>>>> limma corrects for the correlation between the two datasets and looks
>>>> for genes with large residuals from the red (fitted) line rather than
>>>> the blue?
>>>>
>>>> -- Alex Gutteridge
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list