[BioC] How to get gene symbol after deseq?

James W. MacDonald jmacdon at uw.edu
Fri Feb 7 23:50:30 CET 2014


Hi Fabrice,

I don't know. It might have something to do with that gene being a 
predicted gene. Perhaps we don't annotate such things? Or it may have 
been placed in Ensembl between now and when we build the org.Mm.eg.db 
package.

If you require the most recent data, you can always build your own 
package using  makeOrgPackageFromNCBI() in the AnnotationForge package, 
or use biomaRt.

Best,

Jim



On Friday, February 07, 2014 5:41:13 PM, Fabrice Tourre wrote:
> Jim,
>
> Thank you very much. It makes sense to me.
>
> One small question, why ENSMUSG00000082538 is given NA. but it is
> given a symbol Gm14704 on ENSEMBL.
>
> http://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000082538;r=X:70403181-70404054;t=ENSMUST00000120300
>
>
>
> On Fri, Feb 7, 2014 at 4:52 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Fabrice,
>>
>>
>>
>> On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
>>>
>>> Dear experts,
>>>
>>> After I have run deseq, I got a list of genes. They are something like
>>> as follow.
>>>
>>>> resSig[,1]
>>>
>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>> [2] "ENSMUSG00000026727:016"
>>> [3] "ENSMUSG00000026727:004"
>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>> [5] "ENSMUSG00000025730:010"
>>> [6] "ENSMUSG00000005836:007"
>>> [7] "ENSMUSG00000073139:001"
>>>
>>> How can I get back the gene symbol for each ID?
>>
>>
>>> gns
>>
>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>> [2] "ENSMUSG00000026727:016"
>> [3] "ENSMUSG00000026727:004"
>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>> [5] "ENSMUSG00000025730:010"
>> [6] "ENSMUSG00000005836:007"
>> [7] "ENSMUSG00000073139:001"
>>
>>> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
>>> gns2
>> [1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
>> [4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
>> [7] "ENSMUSG00000073139"
>>> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
>>               ENSEMBL   SYMBOL
>> 1 ENSMUSG00000022150     Dab2
>> 2 ENSMUSG00000026727     Rsu1
>> 5 ENSMUSG00000025730   Rab40c
>> 6 ENSMUSG00000005836    Gata6
>> 7 ENSMUSG00000073139 BC023829
>>
>> Note that I am discarding the second Ensembl Gene ID. You could do something
>> more sophisticated to capture duplicated IDs, but I'll leave that for you to
>> figure out.
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> Thank you very much in advance.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list