[Bioc-devel] mapping vector of gene ids to gene symbols

Hervé Pagès hpages at fhcrc.org
Wed Jun 18 19:11:07 CEST 2014


Hi Michael,

On 06/18/2014 06:03 AM, Michael Lawrence wrote:
> Let's say I have a vector of gene IDs where some are NA, and are some are
> repeated, and I want to map them to gene symbols, where I get NAs for the
> NA IDs or IDs without a symbol. What is the best way to do this?
>
> I tried select() but it gave me a table with unique entries; not very
> convenient. It also does not handle NAs. And totally breaks with duplicates
> using the GENEID key type (kind of works with ENTREZID):
>
> select(Homo.sapiens, GENEID, "SYMBOL", "GENEID")
> Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939",
> :
>    269 elements in value to replace 1312 elements
>
> Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but
> this returns a list and fails with NAs.
>
> What would be nice is something like:
>
> map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone)
>
> where OneToOneOrNone is an assertion that I expect the mappings to be
> one-to-one, so it will unlist() or whatever and throw an error if the
> assertion fails. It should return NA for anything not found, and for any NA
> GENEID. Does something like this already exist?

Couldn't this be handled via an extra argument to select()?

I would suggest this argument be called something like 'ManyToOneOrNone'
or 'ManyToZeroOrOne' rather than 'OneToOneOrNone' (different keys
can be mapped to the same symbol and I guess that's fine).

In other words you want an option to force select() to return a
data.frame that is "parallel" to the vector of keys (i.e. 1 row
per key and in the same order, even when this vector contains NAs
and/or duplicates), or fail.

Kind of related to that discussion we had on the bioconductor list
about 1 year ago:

   https://stat.ethz.ch/pipermail/bioconductor/2013-July/054056.html

Cheers,
H.

>
> Michael
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list