[BioC] Interspecies differential expression of orthologs with Edger
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Aug 27 03:45:47 CEST 2014
It works something like this:
library(edgeR)
y <- DGEList(counts=counts)
y <- calcNormFactors(y)
# Column correct log gene lengths
# Columns of gl should add to zero
gl <- log(geneLength)
gl <- t(t(gl)-colMeans(gl))
# Combine library sizes, norm factors and gene lengths:
offset <- expandAsMatrix(getOffset(y))
offset <- offset + gl
Then
y$offset <- offset
y <- estimateGLMCommonDisp(y,design)
etc.
Note that I have not tried this myself. It should work in principle from
a differential expression point of view.
On the other hand, there may be side effects regarding dispersion trend
estimation -- I do not have time to explore this.
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Wed, 27 Aug 2014, assaf www wrote:
> Probably wrong, but the reason I thought of using quantile normalization is
> the need to correct both for the species-length, and library size.
>
>
> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> That doesn't look helpful to me. I suggested that you incorporate gene
>> lengths into the offsets, not do quantile normalization of cpms.
>>
>> Sorry, I just don't have time to develop a code example for you. I hope
>> someone else will help.
>>
>> The whole topic of interspecies differential expression is a can of worms.
>> Even if you adjust for gene length, there will still be differences in gc
>> content and mappability between the species.
>>
>> Gordon
>>
>>
>> On Wed, 27 Aug 2014, assaf www wrote:
>>
>> Dear Gordon thanks,
>>>
>>> Suppose I start with the following matrices:
>>>
>>> # 'counts' is the Rsem filtered counts
>>>
>>>> counts[1:4,]
>>>>
>>> h0 h1 h2 n0 n1 n2
>>> ENSRNOG00000000021 36 17 20 10 25 38
>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>> ENSRNOG00000000028 26 12 11 18 23 28
>>> ENSRNOG00000000029 22 13 12 16 17 15
>>>
>>> # 'geneLength' is the species-specific gene lengths, for species 'h' and
>>> 'n':
>>>
>>>> geneLength[1:3,]
>>>>
>>> h0.length h1.length h2.length n0.length n1.length
>>> n2.length
>>> ENSRNOG00000000021 1200 1200 1200 1303 1303
>>> 1303
>>> ENSRNOG00000000024 1050 1050 1050 3080 3080
>>> 3080
>>> ENSRNOG00000000028 1047 1047 1047 1121 1121
>>> 1121
>>>
>>>
>>> does the following code look correct, and may allow allows across species
>>> analysis ?:
>>> (technically it works, I checked)
>>>
>>> # quantile normalization: (as in here:
>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>> digital+gene+expression
>>> )
>>>
>>> x1 = counts*1000/geneLength
>>> library(limma)
>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>> offset = log(counts+0.1)-log(x2+0.1)
>>>
>>> ...
>>>
>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>> fit <- glmFit(y,design,offset=offset)
>>>
>>> ...
>>>
>>>
>>> Thanks in advance for any help..,
>>> Assaf
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list