[Bioc-devel] mapping vector of gene ids to gene symbols

Robert Castelo robert.castelo at upf.edu
Wed Jun 18 15:20:57 CEST 2014


hi Michael,

this souns like if you had a list of variants where you have annotated 
their Entrez Gene IDs, which sometimes are NA because those variance do 
not overlap any gene and sometimes are repeated Entrez Gene IDs when two 
or more of those variants overlap the same gene :)

at least is the situation i had when programming the VariantFiltering 
package, i also could not find a one-liner solution but you might want 
to look to what i ended up doing there, in case it might be also useful 
for you.

you'll find it in the method "annotateVariants" that dispatches "OrgDb" 
objects (i.e., gene-centric annotation packages), within 
VariantFiltering/R/annotationEngine.R

if you take a look at it, do not hesitate to comment if you have any 
suggestion to improve this. i also look forward to the annotation-gurus 
feedback on this question :)

cheers,

robert.

On 06/18/2014 03:03 PM, Michael Lawrence wrote:
> Let's say I have a vector of gene IDs where some are NA, and are some are
> repeated, and I want to map them to gene symbols, where I get NAs for the
> NA IDs or IDs without a symbol. What is the best way to do this?
>
> I tried select() but it gave me a table with unique entries; not very
> convenient. It also does not handle NAs. And totally breaks with duplicates
> using the GENEID key type (kind of works with ENTREZID):
>
> select(Homo.sapiens, GENEID, "SYMBOL", "GENEID")
> Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938", "245939",
> :
>    269 elements in value to replace 1312 elements
>
> Also tried the venerable mget(GENEID, org.Hs.egSYMBOL, ifnotfound=NA), but
> this returns a list and fails with NAs.
>
> What would be nice is something like:
>
> map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone)
>
> where OneToOneOrNone is an assertion that I expect the mappings to be
> one-to-one, so it will unlist() or whatever and throw an error if the
> assertion fails. It should return NA for anything not found, and for any NA
> GENEID. Does something like this already exist?
>
> Michael
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list