[Bioc-devel] Bioc-devel Digest, Vol 86, Issue 14

Paul Shannon pshannon at systemsbiology.org
Fri May 27 23:00:22 CEST 2011


Here at the ISB we have been thinking about this too, and trying out a few solutions.   I use network (i.e., molecular relationship) data of several very different sorts, including

   1) from SBML, the latest yeast consensus metabolic network
   2) an adaptation of Jitao David Zhang's very useful KEGGgraph package
   3) the extensive (but not highly detailed) geneMANIA datasets

and an interest in everything else -- Nature/NCI, Pathway Commons, psi-ml, HPRD, wikipathways, ...

Along the way I have developed a few practices which may be of interest to others.  These relate to the three data sources I list above, so I will number them the same here:

1) Regarding SBML:  the various XML formats (SBML, BioPAX, PSI-ML) are very valuable, but I find that R data.frames can convey all of the same information, and are much easier to work with.    The R SBML reader, for instance, is not always in sync with the latest SBML format.  But by using the python interface to libSBML, it was easy to create tab-delimited files which then, with the help of the R merge function, nicely combines into one 10,000 line data.frame.

2) Regarding KEGGgraph:  Bioc graph objects have been very useful.  I use them for computation and visualization, and sometimes for storage.  In the case of the yeast metabolic network, I read some or all of that 10k-line data.frame into a graphNEL, assigning node and edge attributes as I go.  Jitao Zhang's KEGGgraph package stores all edge and node attributes in the nodeDataDefaults and edgeDataDefaults slots of the graph; to use the KEGG graph, I found it easiest to  convert it to a graphNEL with the attributes spread out over the appropriate nodes and edges.  RCytoscape, igraph and Rgraphviz can all be used for visualization.

3) Regarding geneMANIA:  some geneMANIA datasets are huge, and will probably need SQLite storage, from which graphs can be formed. Marc Carlson has made some good suggestions about this.  Other geneMANIA datasets work nicely as data.frames.  

And an overall comment:   a lot of the curated networks (NCI, KEGG, wikipathways) are organized in terms of pathways.  Interestingly, the yeast metabolic network -- arguably, the most detailed collection we currently have -- is not organized into pathways; instead, reactions are the main organizing principle  This puzzled me at first.  But now I see that it makes more sense to  a) curate discrete reactions and relationships (be they regulatory, phosphorylation, ubiquitination, metabolic, etc) but b) have membership in a pathway be stored and curated separately.  Pathways are useful to think with, but often somewhat artificially demarcated.  

To sum up, I propose that

   1) data.frames serve us better than XML for disk storage of molecular relationships
   2) bioc graphs are convenient for analysis (using, say, RBGL and igraph) and for visualization
   3) transformation from one format to the other needs to be easy
   4) querying tools will probably become a focus at some point  (i.e., 'Give me the transcription factors of gene X.  Give me the kinase/substrate pairs involved the signaling cascade/s upstream of transcription factor Y.')

 - Paul

On May 27, 2011, at 8:24 AM, Tarca, Adi wrote:

> Hi Robert,
> 
> I think it woud be a good idea to have a wikipathways package that would include not only  gene / small-molecule memberships 
> but aslo the types of relations between them that would enable packages such as SPIA and others to perform pathway analysis. I am not sure what would be the
> best container for these pathways. Maye a graph type of object in which the additional properties of the edges (such as the type of relation, e.g. activation , and the 
> mechanism type, e.g. phospohorilation) are included. Also to have some mechanisms to extract subgraphs based on certain types of relations only,  and support multiple organisms.
> regards,
> Adi Tarca
> 
> 
> 
> ________________________________________
> From: bioc-devel-bounces at r-project.org [bioc-devel-bounces at r-project.org] On Behalf Of bioc-devel-request at r-project.org [bioc-devel-request at r-project.org]
> Sent: Friday, May 27, 2011 6:00 AM
> To: bioc-devel at r-project.org
> Subject: Bioc-devel Digest, Vol 86, Issue 14
> 
> Send Bioc-devel mailing list submissions to
>        bioc-devel at r-project.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://stat.ethz.ch/mailman/listinfo/bioc-devel
> or, via email, send a message with subject or body 'help' to
>        bioc-devel-request at r-project.org
> 
> You can reach the person managing the list at
>        bioc-devel-owner at r-project.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioc-devel digest..."
> 
> 
> Today's Topics:
> 
>   1. interest in wikipathway annotation packages (Robert M. Flight)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 26 May 2011 14:27:41 -0400
> From: "Robert M. Flight" <rflight79 at gmail.com>
> To: bioc-devel at r-project.org
> Subject: [Bioc-devel] interest in wikipathway annotation packages
> Message-ID: <BANLkTim1HmEd87v0Om_0H2Fw=hnzYc_rcA at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi All,
> 
> As a means to providing alternatives to KEGG pathways, I'm wondering
> if there is any interest or previous work done on creating annotation
> packages for WikiPathways or data from PathwayCommons. I have been
> considering getting into this, but am wondering if there is interest
> or support from the general community. From what I can tell,
> generating at least gene / small-molecule memberships in pathways
> should not be too difficult. If anyone knows different, I would
> appreciate feedback.
> 
> Cheers,
> 
> -Robert
> 
> Robert M. Flight, Ph.D.
> University of Louisville Bioinformatics Laboratory
> University of Louisville
> Louisville, KY
> 
> PH 502-852-1809 (HSC)
> PH 502-852-0467 (Belknap)
> EM robert.flight at louisville.edu
> EM rflight79 at gmail.com
> 
> Williams and Holland's Law:
> ? ? ?? If enough data is collected, anything may be proven by
> statistical methods.
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Bioc-devel mailing list
> Bioc-devel at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> 
> End of Bioc-devel Digest, Vol 86, Issue 14
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list