[BioC] KEGG pathname to Refseq accession

Vincent Carey stvjc at channing.harvard.edu
Fri Oct 29 19:22:51 CEST 2010


here is a partly complete function that addresses this concern.  you
have two problems with your query.  first the question is not
organism-independent.  second you need to know the specific
tokenization of pathway names used in the KEGG.db package

kpname2rs = function(pn="p53 signaling pathway", orgpref="hsa",
   genepack="org.Hs.eg.db") {
 pid = get(pn, revmap(KEGGPATHID2NAME))
 nexts = get(paste(orgpref, pid, sep=""), KEGGPATHID2EXTID)
 unique(unlist(mget(nexts, org.Hs.egREFSEQ)))
}

the result of kpname2rs() with the sessionInfo given below has length
325 and includes
a few of the results that you gave.  the code has to be altered to
deal appropriately with
different organisms and to make fewer assumptions on the mapping packages used.

R version 2.12.0 Patched (2010-10-15 r53331)
Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
[1] org.Hs.eg.db_2.4.6   KEGG.db_2.4.5        RSQLite_0.9-2
[4] DBI_0.2-5            AnnotationDbi_1.11.9 Biobase_2.10.0
[7] weaver_1.15.0        codetools_0.2-2      digest_0.4.2


On Fri, Oct 29, 2010 at 9:46 AM, Viki S <isvik at live.com> wrote:
>
> Hi,
> I installed KEGG.db and org.Hs.eg.db packages. I could not find any function linking KEGG IDs / KEGG pathway Names to Refseq accession. I want to obtain a list like this:
>
> $notch_delta_signalling
>  [1] "NM_002405" "AL133036"  "NM_003260" "NM_004316" "NM_005077" "NM_005078" "NM_012486" "NM_004557" "NM_006161" "NM_005618" "NM_020999" "AF038196"  "AA775404"  "AK000144"  "NM_007318"
> [16] "NM_007319" "NM_015383" "AW247193"  "AA503101"  "U94354"    "H49418"    "NM_000021" "AI961337"  "AA583350"  "NM_000214" "NM_019074" "NM_017617" "NM_016941" "NM_000435" "NM_000447"
> [31] "AL050141"  "AI334327"  "NM_002226" "NM_000890"
>
> $p53_signalling
>  [1] "NM_002307" "NM_002392" "NM_003352" "NM_002745" "AI167145"  "AI218142"  "NM_004324" "NM_003633" "NM_013229" "AA845428"  "AB036063"  "NM_006024" "NM_005427" "NM_005446" "NM_005657"
> [16] "NM_007194" "U09579"    "T79183"    "NM_006878" "NM_006880" "NM_006881" "NM_006882" "AI796137"  "NM_000051" "NM_000076" "NM_000077" "NM_000156" "AJ276888"  "NM_001274" "NM_000546"
> [31] "NM_002066" "H93075"    "AL157438"
>
> Any suggestions ?
>
> Thanks
> Viki S
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list