[BioC] biomaRt -> XML::addNode masks graph [was 'slow insertions']

Paul Shannon pshannon at systemsbiology.org
Tue Sep 11 23:58:16 CEST 2007


Hi Seth,

If you are digging around in the innards of the graph package, I have
another suggestion -- an imperfect one -- to suggest.

I sometimes use biomaRt and graph in the same project.  biomaRt
requires XML which has, like graph, a method called 'addMode':

    library (biomaRt)
    Loading required package: XML

    Attaching package: 'XML'

         The following object(s) are masked from package:graph :
        addNode

I can work around this just fine by calling graph::addNode, but maybe
an alias could be adopted as well, and then favored over the long
term -- 'add.node' or some such thing.

Or maybe this isn't worth bothering with.

  - Paul

> Hi Paul,
>
> Paul Shannon <pshannon at systemsbiology.org> writes:
>> It took nearly 24 hours (!) to create a 16k node graph using two
>> different techniques:
>>
>>     g = fromGXL (file ('someFile.gxl'))
>>
>> and
>>
>>    g = new ('graphNEL', edgemode='undirected')
>>    edgeDataDefaults (g, attr='edgeType') = 'edge'
>>    edgeDataDefaults (g, attr='source') = 'unknown'
>>
>>    ...
>>
>>    for (r in 1:max) {
>>      ...
>>      g = addNode (a, g)
>>      g = addNode (b, g)
>>      g = addEdge (a, b, g)
>>      edgeData (g, a, b, 'source') = source
>>      edgeData (g, a, b, 'edgeType') = method
>>      }
>>
>> The 16k nodes and their edges are from a suitably parsed version of
>> all of the reactions
>> reported by KEGG.
>>
>> Is this user error, user misconception, ... or maybe an inefficiency
>> that future versions
>> of the graph package could improve upon?
>
> It looks like fromGXL is doing something quite similar to the for loop
> you describe above.  As you have demonstrated, this is not the most
> efficient way to construct graph objects.  The immediate reason is
> that each call to addNode, addEdge, and edgeData creates a new copy of
> the _entire_ graph.
>
> We will look into this and see if we can provide some relief for
> fromGXL.  Thanks for the report.
>
> + seth
>
> --  
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
> Research Center
> BioC: http://bioconductor.org/
> Blog: http://userprimary.net/user/



More information about the Bioconductor mailing list