[Bioc-devel] mapping vector of gene ids to gene symbols

Tim Triche, Jr. tim.triche at gmail.com
Wed Jun 18 20:24:22 CEST 2014


Seconded, this would be so useful.  I still use mget() for heavens sake. 

Meanwhile I'm going to try VariantFiltering. Thanks for starting this conversation Michael. 

--t

> On Jun 18, 2014, at 11:18 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
> 
> That is a good start. But for convenience, I would favor something that
> just returns the vector corresponding to "column" rather than a data.frame.
> 
> Thanks,
> Michael
> 
> 
> 
>> On Wed, Jun 18, 2014 at 10:11 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
>> 
>> Hi Michael,
>> 
>> 
>>> On 06/18/2014 06:03 AM, Michael Lawrence wrote:
>>> 
>>> Let's say I have a vector of gene IDs where some are NA, and are some are
>>> repeated, and I want to map them to gene symbols, where I get NAs for the
>>> NA IDs or IDs without a symbol. What is the best way to do this?
>>> 
>>> I tried select() but it gave me a table with unique entries; not very
>>> convenient. It also does not handle NAs. And totally breaks with
>>> duplicates
>>> using the GENEID key type (kind of works with ENTREZID):
>>> 
>>> select(Homo.sapiens, GENEID, "SYMBOL", "GENEID")
>>> Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939",
>>> :
>>>   269 elements in value to replace 1312 elements
>>> 
>>> Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but
>>> this returns a list and fails with NAs.
>>> 
>>> What would be nice is something like:
>>> 
>>> map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone)
>>> 
>>> where OneToOneOrNone is an assertion that I expect the mappings to be
>>> one-to-one, so it will unlist() or whatever and throw an error if the
>>> assertion fails. It should return NA for anything not found, and for any
>>> NA
>>> GENEID. Does something like this already exist?
>> 
>> Couldn't this be handled via an extra argument to select()?
>> 
>> I would suggest this argument be called something like 'ManyToOneOrNone'
>> or 'ManyToZeroOrOne' rather than 'OneToOneOrNone' (different keys
>> can be mapped to the same symbol and I guess that's fine).
>> 
>> In other words you want an option to force select() to return a
>> data.frame that is "parallel" to the vector of keys (i.e. 1 row
>> per key and in the same order, even when this vector contains NAs
>> and/or duplicates), or fail.
>> 
>> Kind of related to that discussion we had on the bioconductor list
>> about 1 year ago:
>> 
>>  https://stat.ethz.ch/pipermail/bioconductor/2013-July/054056.html
>> 
>> Cheers,
>> H.
>> 
>> 
>>> Michael
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> --
>> Hervé Pagès
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
> 
>    [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list