[BioC] How to get gene symbol after deseq?

Fri Feb 7 22:52:30 CET 2014

Hi Fabrice,

On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
> Dear experts,
>
> After I have run deseq, I got a list of genes. They are something like
> as follow.
>
>> resSig[,1]
> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
> [2] "ENSMUSG00000026727:016"
> [3] "ENSMUSG00000026727:004"
> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
> [5] "ENSMUSG00000025730:010"
> [6] "ENSMUSG00000005836:007"
> [7] "ENSMUSG00000073139:001"
>
> How can I get back the gene symbol for each ID?

 > gns
[1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
[2] "ENSMUSG00000026727:016"
[3] "ENSMUSG00000026727:004"
[4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
[5] "ENSMUSG00000025730:010"
[6] "ENSMUSG00000005836:007"
[7] "ENSMUSG00000073139:001"

 > gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
 > gns2
[1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
[4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
[7] "ENSMUSG00000073139"
 > select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
              ENSEMBL   SYMBOL
1 ENSMUSG00000022150     Dab2
2 ENSMUSG00000026727     Rsu1
5 ENSMUSG00000025730   Rab40c
6 ENSMUSG00000005836    Gata6
7 ENSMUSG00000073139 BC023829

Note that I am discarding the second Ensembl Gene ID. You could do 
something more sophisticated to capture duplicated IDs, but I'll leave 
that for you to figure out.

Best,

Jim

>
> Thank you very much in advance.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099