[BioC] Interspecies differential expression of orthologs with Edger
assaf www
assafwww at gmail.com
Wed Aug 27 20:07:11 CEST 2014
Thanks Ryan, it seems you are correct
On Wed, Aug 27, 2014 at 7:39 PM, Ryan <rct at thompsonclan.org> wrote:
> Looking at the code for rpkm, it looks like it should work just fine for a
> matrix or a vector.
>
> On Wed Aug 27 09:30:11 2014, assaf www wrote:
>
>> Sorry I'm not clear:
>> for getting the FPKMs, I guess I only need to:
>> rpkm(y, gene.length=geneLength, normalized.lib.sizes=T, log=F)
>>
>> But gene.length should here be a vector, while in this case its a matrix.
>>
>>
>> Assaf
>>
>>
>> On Wed, Aug 27, 2014 at 6:14 PM, assaf www <assafwww at gmail.com> wrote:
>>
>> This is very helpful for me, thanks.
>>>
>>> A slight change that I made in the code you sent, to avoid some R erros,
>>> is
>>>
>>> # to replace:
>>> offset = offset + gl
>>> # with:
>>> offset = sweep(gl,2,offset,"+")
>>>
>>> In addition to differential expression tests, I wanted also to extract
>>> FPKMs values (and/or normalized CPM values), that would take into account
>>> all components of the offset (which if I'm not mistaken are:
>>> library_size +
>>> TMM + gene_size).
>>> I assume rpkm()/cpm() should correct only for library_size + TMM.
>>> Is there a possibly "decent" solution for that ?
>>>
>>> all the best, and thanks,
>>> Assaf
>>>
>>>
>>>
>>> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>> wrote:
>>>
>>> It works something like this:
>>>>
>>>> library(edgeR)
>>>> y <- DGEList(counts=counts)
>>>> y <- calcNormFactors(y)
>>>>
>>>> # Column correct log gene lengths
>>>> # Columns of gl should add to zero
>>>>
>>>> gl <- log(geneLength)
>>>> gl <- t(t(gl)-colMeans(gl))
>>>>
>>>> # Combine library sizes, norm factors and gene lengths:
>>>>
>>>> offset <- expandAsMatrix(getOffset(y))
>>>> offset <- offset + gl
>>>>
>>>> Then
>>>>
>>>> y$offset <- offset
>>>> y <- estimateGLMCommonDisp(y,design)
>>>>
>>>> etc.
>>>>
>>>> Note that I have not tried this myself. It should work in principle
>>>> from
>>>> a differential expression point of view.
>>>>
>>>> On the other hand, there may be side effects regarding dispersion trend
>>>> estimation -- I do not have time to explore this.
>>>>
>>>> Gordon
>>>>
>>>> ---------------------------------------------
>>>> Professor Gordon K Smyth,
>>>> Bioinformatics Division,
>>>> Walter and Eliza Hall Institute of Medical Research,
>>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>>> http://www.statsci.org/smyth
>>>>
>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>
>>>> Probably wrong, but the reason I thought of using quantile
>>>> normalization
>>>>
>>>>> is
>>>>> the need to correct both for the species-length, and library size.
>>>>>
>>>>>
>>>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>>> wrote:
>>>>>
>>>>> That doesn't look helpful to me. I suggested that you incorporate
>>>>> gene
>>>>>
>>>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>>>
>>>>>> Sorry, I just don't have time to develop a code example for you. I
>>>>>> hope
>>>>>> someone else will help.
>>>>>>
>>>>>> The whole topic of interspecies differential expression is a can of
>>>>>> worms.
>>>>>> Even if you adjust for gene length, there will still be differences in
>>>>>> gc
>>>>>> content and mappability between the species.
>>>>>>
>>>>>> Gordon
>>>>>>
>>>>>>
>>>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>>>
>>>>>> Dear Gordon thanks,
>>>>>>
>>>>>>
>>>>>>> Suppose I start with the following matrices:
>>>>>>>
>>>>>>> # 'counts' is the Rsem filtered counts
>>>>>>>
>>>>>>> counts[1:4,]
>>>>>>>
>>>>>>>>
>>>>>>>> h0 h1 h2 n0 n1 n2
>>>>>>>>
>>>>>>> ENSRNOG00000000021 36 17 20 10 25 38
>>>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>>>> ENSRNOG00000000028 26 12 11 18 23 28
>>>>>>> ENSRNOG00000000029 22 13 12 16 17 15
>>>>>>>
>>>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h'
>>>>>>> and
>>>>>>> 'n':
>>>>>>>
>>>>>>> geneLength[1:3,]
>>>>>>>
>>>>>>>>
>>>>>>>> h0.length h1.length h2.length n0.length
>>>>>>>> n1.length
>>>>>>>>
>>>>>>> n2.length
>>>>>>> ENSRNOG00000000021 1200 1200 1200 1303 1303
>>>>>>> 1303
>>>>>>> ENSRNOG00000000024 1050 1050 1050 3080 3080
>>>>>>> 3080
>>>>>>> ENSRNOG00000000028 1047 1047 1047 1121 1121
>>>>>>> 1121
>>>>>>>
>>>>>>>
>>>>>>> does the following code look correct, and may allow allows across
>>>>>>> species
>>>>>>> analysis ?:
>>>>>>> (technically it works, I checked)
>>>>>>>
>>>>>>> # quantile normalization: (as in here:
>>>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>>>> digital+gene+expression
>>>>>>> )
>>>>>>>
>>>>>>> x1 = counts*1000/geneLength
>>>>>>> library(limma)
>>>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance for any help..,
>>>>>>> Assaf
>>>>>>>
>>>>>>>
>>>>>> ____________________________________________________________
>>>> __________
>>>> The information in this email is confidential and intended solely for
>>>> the
>>>> addressee.
>>>> You must not disclose, forward, print or use it without the permission
>>>> of
>>>> the sender.
>>>> ______________________________________________________________________
>>>>
>>>>
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
>
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list