[BioC] Annotation problem through org.Mm.eg.db
Marc Carlson
mcarlson at fhcrc.org
Wed Mar 20 21:10:43 CET 2013
Hi Himanshu,
The org.Mm.eg.db package has records for 58021 entrez gene IDs. How do
I know that? Well:
k = keys(org.Mm.eg.db, keytype="ENTREZID")
length(k)
So how many of those have a gene symbol attached? Well it looks like
they basically all map to something:
res = select(org.Mm.eg.db, keys=k, cols="SYMBOL", keytype="ENTREZID")
dim(res)
Although these are still gene symbols, and as such, they are not
guaranteed to be unique. So it's not surprising if some of them are
shared by different genes... :(
length(res[["SYMBOL"]])
length(unique(res[["SYMBOL"]]))
But not too many actually. Only 284 in fact. So this all raises
another question. Specifically: what is going on with your ids? Why
are so many of them not matching up with any sort of symbol? My best
guess is that some of them are not really mouse entrez gene ids. So
what happens if you take your list of ids and do this:
table(ids %in% k)
And are you sure that your ids are really supposed to be entrez gene IDs?
Marc
On 03/19/2013 08:18 AM, Himanshu Sharma wrote:
> Dear Mailing list,
> I have mouse gene entrez ids after RNAseq analysis from RSEM and edgeR. I have 13354 gene ids and I am trying to get the gene symbol for the same. I have been doing the following :
>
> symbol<- select(org.Mm.eg.db, keys=ids, keytype="ENTREZID", cols="SYMBOL")
> where ids contains the list of 13354 gene ids
>
> But when I see the result, I get half or less than half symbols for gene ids.
> Is there a better way to map these ids to gene symbols?.
>
> Thanks in advance,
> Himanshu
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list