[BioC] Interspecies differential expression of orthologs with Edger
Ryan
rct at thompsonclan.org
Wed Aug 27 18:39:10 CEST 2014
Looking at the code for rpkm, it looks like it should work just fine
for a matrix or a vector.
On Wed Aug 27 09:30:11 2014, assaf www wrote:
> Sorry I'm not clear:
> for getting the FPKMs, I guess I only need to:
> rpkm(y, gene.length=geneLength, normalized.lib.sizes=T, log=F)
>
> But gene.length should here be a vector, while in this case its a matrix.
>
>
> Assaf
>
>
> On Wed, Aug 27, 2014 at 6:14 PM, assaf www <assafwww at gmail.com> wrote:
>
>> This is very helpful for me, thanks.
>>
>> A slight change that I made in the code you sent, to avoid some R erros,
>> is
>>
>> # to replace:
>> offset = offset + gl
>> # with:
>> offset = sweep(gl,2,offset,"+")
>>
>> In addition to differential expression tests, I wanted also to extract
>> FPKMs values (and/or normalized CPM values), that would take into account
>> all components of the offset (which if I'm not mistaken are: library_size +
>> TMM + gene_size).
>> I assume rpkm()/cpm() should correct only for library_size + TMM.
>> Is there a possibly "decent" solution for that ?
>>
>> all the best, and thanks,
>> Assaf
>>
>>
>>
>> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>
>>> It works something like this:
>>>
>>> library(edgeR)
>>> y <- DGEList(counts=counts)
>>> y <- calcNormFactors(y)
>>>
>>> # Column correct log gene lengths
>>> # Columns of gl should add to zero
>>>
>>> gl <- log(geneLength)
>>> gl <- t(t(gl)-colMeans(gl))
>>>
>>> # Combine library sizes, norm factors and gene lengths:
>>>
>>> offset <- expandAsMatrix(getOffset(y))
>>> offset <- offset + gl
>>>
>>> Then
>>>
>>> y$offset <- offset
>>> y <- estimateGLMCommonDisp(y,design)
>>>
>>> etc.
>>>
>>> Note that I have not tried this myself. It should work in principle from
>>> a differential expression point of view.
>>>
>>> On the other hand, there may be side effects regarding dispersion trend
>>> estimation -- I do not have time to explore this.
>>>
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>
>>> Probably wrong, but the reason I thought of using quantile normalization
>>>> is
>>>> the need to correct both for the species-length, and library size.
>>>>
>>>>
>>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>> wrote:
>>>>
>>>> That doesn't look helpful to me. I suggested that you incorporate gene
>>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>>
>>>>> Sorry, I just don't have time to develop a code example for you. I hope
>>>>> someone else will help.
>>>>>
>>>>> The whole topic of interspecies differential expression is a can of
>>>>> worms.
>>>>> Even if you adjust for gene length, there will still be differences in
>>>>> gc
>>>>> content and mappability between the species.
>>>>>
>>>>> Gordon
>>>>>
>>>>>
>>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>>
>>>>> Dear Gordon thanks,
>>>>>
>>>>>>
>>>>>> Suppose I start with the following matrices:
>>>>>>
>>>>>> # 'counts' is the Rsem filtered counts
>>>>>>
>>>>>> counts[1:4,]
>>>>>>>
>>>>>>> h0 h1 h2 n0 n1 n2
>>>>>> ENSRNOG00000000021 36 17 20 10 25 38
>>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>>> ENSRNOG00000000028 26 12 11 18 23 28
>>>>>> ENSRNOG00000000029 22 13 12 16 17 15
>>>>>>
>>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h'
>>>>>> and
>>>>>> 'n':
>>>>>>
>>>>>> geneLength[1:3,]
>>>>>>>
>>>>>>> h0.length h1.length h2.length n0.length n1.length
>>>>>> n2.length
>>>>>> ENSRNOG00000000021 1200 1200 1200 1303 1303
>>>>>> 1303
>>>>>> ENSRNOG00000000024 1050 1050 1050 3080 3080
>>>>>> 3080
>>>>>> ENSRNOG00000000028 1047 1047 1047 1121 1121
>>>>>> 1121
>>>>>>
>>>>>>
>>>>>> does the following code look correct, and may allow allows across
>>>>>> species
>>>>>> analysis ?:
>>>>>> (technically it works, I checked)
>>>>>>
>>>>>> # quantile normalization: (as in here:
>>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>>> digital+gene+expression
>>>>>> )
>>>>>>
>>>>>> x1 = counts*1000/geneLength
>>>>>> library(limma)
>>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> Thanks in advance for any help..,
>>>>>> Assaf
>>>>>>
>>>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list