[BioC] Interspecies differential expression of orthologs with Edger

Wed Aug 27 20:07:11 CEST 2014

Thanks Ryan, it seems you are correct

On Wed, Aug 27, 2014 at 7:39 PM, Ryan <rct at thompsonclan.org> wrote:

> Looking at the code for rpkm, it looks like it should work just fine for a
> matrix or a vector.
>
> On Wed Aug 27 09:30:11 2014, assaf www wrote:
>
>> Sorry I'm not clear:
>>   for getting the FPKMs, I guess I only need to:
>> rpkm(y, gene.length=geneLength, normalized.lib.sizes=T, log=F)
>>
>> But gene.length should here be a vector, while in this case its a matrix.
>>
>>
>> Assaf
>>
>>
>> On Wed, Aug 27, 2014 at 6:14 PM, assaf www <assafwww at gmail.com> wrote:
>>
>>  This is very helpful for me, thanks.
>>>
>>> A slight change that I made in the code you sent, to avoid some R erros,
>>> is
>>>
>>> # to replace:
>>> offset = offset + gl
>>> # with:
>>> offset = sweep(gl,2,offset,"+")
>>>
>>> In addition to differential expression tests, I wanted also to extract
>>> FPKMs values (and/or normalized CPM values), that would take into account
>>> all components of the offset (which if I'm not mistaken are:
>>> library_size +
>>> TMM + gene_size).
>>> I assume rpkm()/cpm() should correct only for library_size + TMM.
>>> Is there a possibly "decent" solution for that ?
>>>
>>> all the best, and thanks,
>>> Assaf
>>>
>>>
>>>
>>> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>> wrote:
>>>
>>>  It works something like this:
>>>>
>>>>    library(edgeR)
>>>>    y <- DGEList(counts=counts)
>>>>    y <- calcNormFactors(y)
>>>>
>>>> # Column correct log gene lengths
>>>> # Columns of gl should add to zero
>>>>
>>>>    gl <- log(geneLength)
>>>>    gl <- t(t(gl)-colMeans(gl))
>>>>
>>>> # Combine library sizes, norm factors and gene lengths:
>>>>
>>>>    offset <- expandAsMatrix(getOffset(y))
>>>>    offset <- offset + gl
>>>>
>>>> Then
>>>>
>>>>    y$offset <- offset
>>>>    y <- estimateGLMCommonDisp(y,design)
>>>>
>>>> etc.
>>>>
>>>> Note that I have not tried this myself.  It should work in principle
>>>> from
>>>> a differential expression point of view.
>>>>
>>>> On the other hand, there may be side effects regarding dispersion trend
>>>> estimation -- I do not have time to explore this.
>>>>
>>>> Gordon
>>>>
>>>> ---------------------------------------------
>>>> Professor Gordon K Smyth,
>>>> Bioinformatics Division,
>>>> Walter and Eliza Hall Institute of Medical Research,
>>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>>> http://www.statsci.org/smyth
>>>>
>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>
>>>>   Probably wrong, but the reason I thought of using quantile
>>>> normalization
>>>>
>>>>> is
>>>>> the need to correct both for the species-length, and library size.
>>>>>
>>>>>
>>>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>>> wrote:
>>>>>
>>>>>   That doesn't look helpful to me.  I suggested that you incorporate
>>>>> gene
>>>>>
>>>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>>>
>>>>>> Sorry, I just don't have time to develop a code example for you.  I
>>>>>> hope
>>>>>> someone else will help.
>>>>>>
>>>>>> The whole topic of interspecies differential expression is a can of
>>>>>> worms.
>>>>>> Even if you adjust for gene length, there will still be differences in
>>>>>> gc
>>>>>> content and mappability between the species.
>>>>>>
>>>>>> Gordon
>>>>>>
>>>>>>
>>>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>>>
>>>>>>   Dear Gordon thanks,
>>>>>>
>>>>>>
>>>>>>> Suppose I start with the following matrices:
>>>>>>>
>>>>>>> # 'counts' is the Rsem filtered counts
>>>>>>>
>>>>>>>   counts[1:4,]
>>>>>>>
>>>>>>>>
>>>>>>>>                       h0  h1  h2  n0  n1  n2
>>>>>>>>
>>>>>>> ENSRNOG00000000021   36  17  20  10  25  38
>>>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>>>> ENSRNOG00000000028   26  12  11  18  23  28
>>>>>>> ENSRNOG00000000029   22  13  12  16  17  15
>>>>>>>
>>>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h'
>>>>>>> and
>>>>>>> 'n':
>>>>>>>
>>>>>>>   geneLength[1:3,]
>>>>>>>
>>>>>>>>
>>>>>>>>                     h0.length h1.length h2.length n0.length
>>>>>>>> n1.length
>>>>>>>>
>>>>>>> n2.length
>>>>>>> ENSRNOG00000000021      1200      1200      1200      1303      1303
>>>>>>> 1303
>>>>>>> ENSRNOG00000000024      1050      1050      1050      3080      3080
>>>>>>> 3080
>>>>>>> ENSRNOG00000000028      1047      1047      1047      1121      1121
>>>>>>> 1121
>>>>>>>
>>>>>>>
>>>>>>> does the following code look correct, and may allow allows across
>>>>>>> species
>>>>>>> analysis ?:
>>>>>>> (technically it works, I checked)
>>>>>>>
>>>>>>> # quantile normalization: (as in here:
>>>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>>>> digital+gene+expression
>>>>>>> )
>>>>>>>
>>>>>>> x1 = counts*1000/geneLength
>>>>>>> library(limma)
>>>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance for any help..,
>>>>>>> Assaf
>>>>>>>
>>>>>>>
>>>>>>  ____________________________________________________________
>>>> __________
>>>> The information in this email is confidential and intended solely for
>>>> the
>>>> addressee.
>>>> You must not disclose, forward, print or use it without the permission
>>>> of
>>>> the sender.
>>>> ______________________________________________________________________
>>>>
>>>>
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
>

	[[alternative HTML version deleted]]