[BioC] How to check if gene name is an alias or misspelt
Hervé Pagès
hpages at fhcrc.org
Sat Apr 11 02:03:36 CEST 2009
Hi Dan,
The org.XX.egALIAS2EG map combined with some fuzzy matching
function can help you do this:
> library(org.Hs.eg.db)
> get("S-HT3c2", org.Hs.egALIAS2EG)
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "S-HT3c2" not found
> agrep("S-HT3c2", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=1)
[1] "5-HT3c2"
The 'max.distance argument' lets you control the max number of misspelling
letters (including inserted/deleted letters):
> get("WUGSC:H-DJO747G182", org.Hs.egALIAS2EG)
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "WUGSC:H-DJO747G182" not found
> agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=2)
character(0)
> agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=3)
[1] "WUGSC:H_DJ0747G18.2"
Cheers,
H.
Daniel Brewer wrote:
> Hello,
>
> I have a list of genes which are not official gene symbols. Normally in
> this case I would search gene in entrez to see if it is an alias and
> then take the official symbol. Is there a way to (semi) automate this
> within bioconductor?
>
> If this fails I normally google it to see if it is likely to be a
> misspelling S instead of 5 etc. ANy suggestions for that?
>
> Many thanks
>
> Dan
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list