[BioC] How to get gene symbol after deseq?
James W. MacDonald
jmacdon at uw.edu
Fri Feb 7 22:52:30 CET 2014
Hi Fabrice,
On 2/7/2014 4:08 PM, Fabrice Tourre wrote:
> Dear experts,
>
> After I have run deseq, I got a list of genes. They are something like
> as follow.
>
>> resSig[,1]
> [1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
> [2] "ENSMUSG00000026727:016"
> [3] "ENSMUSG00000026727:004"
> [4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
> [5] "ENSMUSG00000025730:010"
> [6] "ENSMUSG00000005836:007"
> [7] "ENSMUSG00000073139:001"
>
> How can I get back the gene symbol for each ID?
> gns
[1] "ENSMUSG00000022150+ENSMUSG00000079102:009"
[2] "ENSMUSG00000026727:016"
[3] "ENSMUSG00000026727:004"
[4] "ENSMUSG00000022150+ENSMUSG00000079102:015"
[5] "ENSMUSG00000025730:010"
[6] "ENSMUSG00000005836:007"
[7] "ENSMUSG00000073139:001"
> gns2 <- sapply(strsplit(gns, "\\+|:"), "[", 1)
> gns2
[1] "ENSMUSG00000022150" "ENSMUSG00000026727" "ENSMUSG00000026727"
[4] "ENSMUSG00000022150" "ENSMUSG00000025730" "ENSMUSG00000005836"
[7] "ENSMUSG00000073139"
> select(Mus.musculus, gns2, "SYMBOL", "ENSEMBL")
ENSEMBL SYMBOL
1 ENSMUSG00000022150 Dab2
2 ENSMUSG00000026727 Rsu1
5 ENSMUSG00000025730 Rab40c
6 ENSMUSG00000005836 Gata6
7 ENSMUSG00000073139 BC023829
Note that I am discarding the second Ensembl Gene ID. You could do
something more sophisticated to capture duplicated IDs, but I'll leave
that for you to figure out.
Best,
Jim
>
> Thank you very much in advance.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list