[BioC] mis-matched gene symbols and entrez ID in biomaRt
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Sep 7 05:42:57 CEST 2011
Hi,
On Tue, Sep 6, 2011 at 11:24 PM, Wendy Qiao <wendy2.qiao at gmail.com> wrote:
> Hi all,
>
> I am converting the HGNC symbols from an Illumina human array to Entrez ID
> using biomaRt. I found that there are some gene symbols are matched to many
> Entrez IDs, and vice versa. I am wondering if how to solve the problem, so
> one gene symbol is only matched to one Entrez ID. Or is there any other
> package that I can use for matching gene symbols to Entrez IDs. Thank you in
> advance.
>
> Wendy
>
> =====
> In the following example, BAGE2, 3, 4 and 5 are matched to 85316 and 85317
> which are the Entrez IDs of BAGE5 and BAGE4, respectively.
Not sure why that's happening (out of curiosity, is ensembl_mart_51 an
older version of the db(?) -- I hardly ever use biomart, it seems)
Anyway, seems like using the org.Hs.eg.db package would be ok:
R> library(org.Hs.eg.db)
R> mget(paste("BAGE", 2:5, sep=""), revmap(org.Hs.egSYMBOL), ifnotfound=NA)
$BAGE2
[1] "85319"
$BAGE3
[1] "85318"
$BAGE4
[1] "85317"
$BAGE5
[1] "85316"
... and you get the added bonus of not having to fire your query "over
the wire".
HTH,
-steve
>
> library('biomaRt')
> ensembl=useMart("ensembl_mart_51",dataset="hsapiens_gene_ensembl",archive=TRUE)
> Entrez<-getBM(attributes=c("hgnc_symbol","entrezgene"),filters="hgnc_symbol",values=GeneList,mart=ensembl)
> # class(GeneList) = factor
>
> Entrez[1:20,]
> hgnc_symbol entrezgene
> 1 ZFP62 92379
> 2 C9orf169 375791
> 3 FAM72D 653573
> 4 HMX1 NA
> 5 HMX1 3166
> 6 ZFP62 NA
> 7 RSPO4 343637
> 8 DOC2B 8447
> 9 C8orf42 157695
> 10 TTTY8 NA
> 11 A26C3 NA
> 12 BAGE5 85316
> 13 BAGE4 85316
> 14 BAGE3 85316
> 15 BAGE2 85316
> 16 BAGE5 85317
> 17 BAGE4 85317
> 18 BAGE3 85317
> 19 BAGE2 85317
> 20 NBR1 4077
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list