[BioC] How to get gene symbol after deseq?
James W. MacDonald
jmacdon at uw.edu
Fri Feb 7 23:50:30 CET 2014
Hi Fabrice,
I don't know. It might have something to do with that gene being a
predicted gene. Perhaps we don't annotate such things? Or it may have
been placed in Ensembl between now and when we build the org.Mm.eg.db
package.
If you require the most recent data, you can always build your own
package using makeOrgPackageFromNCBI() in the AnnotationForge package,
or use biomaRt.
Best,
Jim
On Friday, February 07, 2014 5:41:13 PM, Fabrice Tourre wrote:
> Jim,
>
> Thank you very much. It makes sense to me.
>
> One small question, why ENSMUSG00000082538 is given NA. but it is
> given a symbol Gm14704 on ENSEMBL.
>
> http://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000082538;r=X:70403181-70404054;t=ENSMUST00000120300
>
>
>
> On Fri, Feb 7, 2014 at 4:52 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Fabrice,
>>
>>
>>
>> On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
>>>
>>> Dear experts,
>>>
>>> After I have run deseq, I got a list of genes. They are something like
>>> as follow.
>>>
>>>> resSig[,1]
>>>
>>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>>> [2] "ENSMUSG00000026727:016"
>>> [3] "ENSMUSG00000026727:004"
>>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>>> [5] "ENSMUSG00000025730:010"
>>> [6] "ENSMUSG00000005836:007"
>>> [7] "ENSMUSG00000073139:001"
>>>
>>> How can I get back the gene symbol for each ID?
>>
>>
>>> gns
>>
>> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
>> [2] "ENSMUSG00000026727:016"
>> [3] "ENSMUSG00000026727:004"
>> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
>> [5] "ENSMUSG00000025730:010"
>> [6] "ENSMUSG00000005836:007"
>> [7] "ENSMUSG00000073139:001"
>>
>>> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
>>> gns2
>> [1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
>> [4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
>> [7] "ENSMUSG00000073139"
>>> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
>> ENSEMBL SYMBOL
>> 1 ENSMUSG00000022150 Dab2
>> 2 ENSMUSG00000026727 Rsu1
>> 5 ENSMUSG00000025730 Rab40c
>> 6 ENSMUSG00000005836 Gata6
>> 7 ENSMUSG00000073139 BC023829
>>
>> Note that I am discarding the second Ensembl Gene ID. You could do something
>> more sophisticated to capture duplicated IDs, but I'll leave that for you to
>> figure out.
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> Thank you very much in advance.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list