[BioC] KEGG pathname to Refseq accession
Vincent Carey
stvjc at channing.harvard.edu
Fri Oct 29 19:22:51 CEST 2010
here is a partly complete function that addresses this concern. you
have two problems with your query. first the question is not
organism-independent. second you need to know the specific
tokenization of pathway names used in the KEGG.db package
kpname2rs = function(pn="p53 signaling pathway", orgpref="hsa",
genepack="org.Hs.eg.db") {
pid = get(pn, revmap(KEGGPATHID2NAME))
nexts = get(paste(orgpref, pid, sep=""), KEGGPATHID2EXTID)
unique(unlist(mget(nexts, org.Hs.egREFSEQ)))
}
the result of kpname2rs() with the sessionInfo given below has length
325 and includes
a few of the results that you gave. the code has to be altered to
deal appropriately with
different organisms and to make fewer assumptions on the mapping packages used.
R version 2.12.0 Patched (2010-10-15 r53331)
Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils methods
[8] base
other attached packages:
[1] org.Hs.eg.db_2.4.6 KEGG.db_2.4.5 RSQLite_0.9-2
[4] DBI_0.2-5 AnnotationDbi_1.11.9 Biobase_2.10.0
[7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2
On Fri, Oct 29, 2010 at 9:46 AM, Viki S <isvik at live.com> wrote:
>
> Hi,
> I installed KEGG.db and org.Hs.eg.db packages. I could not find any function linking KEGG IDs / KEGG pathway Names to Refseq accession. I want to obtain a list like this:
>
> $notch_delta_signalling
> [1] "NM_002405" "AL133036" "NM_003260" "NM_004316" "NM_005077" "NM_005078" "NM_012486" "NM_004557" "NM_006161" "NM_005618" "NM_020999" "AF038196" "AA775404" "AK000144" "NM_007318"
> [16] "NM_007319" "NM_015383" "AW247193" "AA503101" "U94354" "H49418" "NM_000021" "AI961337" "AA583350" "NM_000214" "NM_019074" "NM_017617" "NM_016941" "NM_000435" "NM_000447"
> [31] "AL050141" "AI334327" "NM_002226" "NM_000890"
>
> $p53_signalling
> [1] "NM_002307" "NM_002392" "NM_003352" "NM_002745" "AI167145" "AI218142" "NM_004324" "NM_003633" "NM_013229" "AA845428" "AB036063" "NM_006024" "NM_005427" "NM_005446" "NM_005657"
> [16] "NM_007194" "U09579" "T79183" "NM_006878" "NM_006880" "NM_006881" "NM_006882" "AI796137" "NM_000051" "NM_000076" "NM_000077" "NM_000156" "AJ276888" "NM_001274" "NM_000546"
> [31] "NM_002066" "H93075" "AL157438"
>
> Any suggestions ?
>
> Thanks
> Viki S
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list