[BioC] how to store multiple relationships between nodes in graphNEL?

Wed Jul 25 00:12:04 CEST 2007

Hi Seth,

Thanks for a bunch of good suggestions.

On your specific question,

Me :
>> And how could I best store the pubmed id associated with each method?

You:
> Not sure I'm following you, what does "each method" refer to here?
>

Yep, 'method' is ambiguous.  Sorry.  I meant to say:

   1) Not infrequently a protein-protein interaction is identified
      by multiple experimental methods which give it extra credibility.

   2) A Pubmed ID is often included in archived interaction data,  
providing a reference
      (a paper) for each underlying experimental method.  In  
cytoscape, we often make these clickable
      links, so you can examine the abstract, and decide whether you
      want to include the interaction in your network.

I general, I am trying to figure out the best (and quickest) way to  
adapt
graphNEL  to store possibly many relationships between two nodes in a  
graph, wherein
each relationship may have several attributes to describe it.  For  
each experimentally
derived relationship, there may be a pubmed ID, a confidence score,  
some indication of
the scale of the experiment, etc.  I needed (and you suggested) a  
quick solution for now.

As I understand it, you suggest that two more thorough-going  
strategies may
be worth considering:  use a list of S4 objects, or create a new  
multiGraph class.
When I have a little time, I will look into these.

By the way, the graph package is a delight, as is the way it meshes  
with RBGL.

For example, I just figured out how easy it is to find putative  
relationships between
previously unconnected nodes, using a reference graph (in my case,  
DIP), the graph package,
and RBGL:

   subGraph(sp.between (g, node1, node2)[[1]]$path, g)

That's lovely!

  - Paul

> I'd like to find the best way to record multiple relationships
> between nodes in a graphNEL object.  The data for my graph comes
> from DIP, the Database of Interacting Proteins, where many protein
> interactions have several kinds of evidence.  In other settings, I
> represent this as multiple edges, another solution is needed here,
> since graphNEL is designed for at most one edge between nodes.
>

One possibility might be a list of graphNEL objects all with the same
node set.  You could also explore a more structured approach and
implement a multiGraph class.

> So I am improvising, packing any number of experimental methods into
> a token-separated list in a single edge's edgeData.  Here is an
> example of one pair of yeast proteins observed by three different
> methods:
>

>
>    edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType')
>      $`YCR084C|YBR112C`
>       [1] "Immunoprecipitation::Affinity chromatography::Gel
> filtration chromatography"
>

I think you can avoid the token-separation game, but maybe I'm missing
something.  The edge attributes can be any R object, even, say, a
character vector with length greater than 1 ;-)

So why not have

    edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType')
      $`YCR084C|YBR112C`
       [1] "Immunoprecipitation" "Affinity chromatography" "Gel  
filtration chromatography"

I'm not familiar with this data so I don't know if that makes sense or
is what you want.  Another option for using the edge attributes might
be to use a list (or even an S4 class) with named components -- but
here it isn't clear whether simply using additional edge attributes
might be better.  For example, you could store a logical value for
each edge type:

    define edge attributes: type1, type2, type3
    for each edge, the value of edge attributes type1-3 is TRUE or
    FALSE depending on whether this edge is of that type.