[BioC] KEGG: gene ids for nodes in a pathway

Tue May 5 16:20:02 CEST 2009

Hi Tim,

If you are willing to make a strong assumption about gene symbols, then 
you can group things using tapply().

a <- unlist(mget(mget("04630", revmap(org.Hs.egPATH), 
ifnotfound=NA)[[1]], org.Hs.egSYMBOL))

b <- sub("[0-9]+$", "", a)

tapply(1:length(a), b, function(x) a[x])

This assumes that any numbers at the end of a gene symbol can be 
stripped off to get the 'base' gene type (e.g. IL2, IL3, IL4, IL21 are 
all Interleukins), as well as assuming that all gene symbols are 
consistent.

You could also assume that you can strip off the last letter or two to 
get the 'base' gene symbol, which might get you a bit closer to what you 
want. Again, strong assumptions apply.

Best,

Jim

Tim Smith wrote:
> Hi Mark & Saroj,
> 
> Thanks for the replies.
> 
> As Saroj suggested, I could use grep to get to 'STAT1',
> 'STAT3',....etc. for the STAT pathway. However, I would like to
> automate the process for the pathway (and possibly several pathways).
> With grep, I would need to actually look at the pathway in KEGG,
> figure out the nodes (e.g. 'STAT', 'JAK', 'PI3K'...etc) and then
> perform a grep for each of these to get to the genes (e.g. 'STAT1',
> 'STAT3', ...etc. for the 'STAT' node) associated with each of these
> nodes. What I was looking for was something I could use so that I
> could automate the process. I guess I could still use grep if there
> was some way of getting to all the node labels ('STAT') in a
> particular pathway. Is there such functionality?
> 
> thanks again!
> 
> 
> 
> 
> 
> 
> ________________________________ From: Marc Carlson
> <mcarlson at fhcrc.org>
> 
> Cc: bioc <bioconductor at stat.math.ethz.ch> Sent: Monday, May 4, 2009
> 6:05:52 PM Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway
> 
> Hi Tim,
> 
> I think that the mapping you are using below already maps the entrez 
> gene IDs associated with a particular pathway.  All you need to do is
>  use mget() instead of toTable().
> 
> So for pathway "04630", you can just get the associated entrez gene
> IDs like this:
> 
> library(org.Hs.eg.db) mget("04630", revmap(org.Hs.egPATH),
> ifnotfound=NA)
> 
> 
> Marc
> 
> 
> 
> 
> 
> Tim Smith wrote:
>> Hi,
>> 
>> I wanted a list of genes for a particular pathway arranged
>> nodewise. For example, if I select the Jak-stat pathway
>> ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I
>> get the entrez ids of genes associated with the node 'STAT' ?
>> Currently, I use the following code:
>> 
>> x <- toTable(org.Hs.egPATH)
>> 
>> and then select genes associated with a particular pathway (e.g.
>> for Jak-stat: "04630") . But this gives the entire set of genes
>> associated with the pathway. Is there a way to get the entrez ids
>> of the genes associated with each of the nodes ('JAK', 'STAT',
>> 'STAM','PIAS' etc.) in the pathway?
>> 
>> thanks!
>> 
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at stat.math.ethz.ch 
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826