[BioC] How to check if gene name is an alias or misspelt

Hervé Pagès hpages at fhcrc.org
Sat Apr 11 02:03:36 CEST 2009


Hi Dan,

The org.XX.egALIAS2EG map combined with some fuzzy matching
function can help you do this:

   > library(org.Hs.eg.db)
   > get("S-HT3c2", org.Hs.egALIAS2EG)
   Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
     value for "S-HT3c2" not found
   > agrep("S-HT3c2", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=1)
   [1] "5-HT3c2"

The 'max.distance argument' lets you control the max number of misspelling
letters (including inserted/deleted letters):

   > get("WUGSC:H-DJO747G182", org.Hs.egALIAS2EG)
   Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
     value for "WUGSC:H-DJO747G182" not found
   > agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=2)
   character(0)
   > agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=3)
   [1] "WUGSC:H_DJ0747G18.2"

Cheers,
H.


Daniel Brewer wrote:
> Hello,
> 
> I have a list of genes which are not official gene symbols.  Normally in
> this case I would search gene in entrez to see if it is an alias and
> then take the official symbol.  Is there a way to (semi) automate this
> within bioconductor?
> 
> If this fails I normally google it to see if it is likely to be a
> misspelling S instead of 5 etc.  ANy suggestions for that?
> 
> Many thanks
> 
> Dan
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list