[BioC] slow insertions in to graphNEL object (24 hours for 16k nodes)

Seth Falcon sfalcon at fhcrc.org
Tue Sep 11 23:27:13 CEST 2007


Hi Paul,

Paul Shannon <pshannon at systemsbiology.org> writes:
> It took nearly 24 hours (!) to create a 16k node graph using two  
> different techniques:
>
>     g = fromGXL (file ('someFile.gxl'))
>
> and
>
>    g = new ('graphNEL', edgemode='undirected')
>    edgeDataDefaults (g, attr='edgeType') = 'edge'
>    edgeDataDefaults (g, attr='source') = 'unknown'
>
>    ...
>
>    for (r in 1:max) {
>      ...
>      g = addNode (a, g)
>      g = addNode (b, g)
>      g = addEdge (a, b, g)
>      edgeData (g, a, b, 'source') = source
>      edgeData (g, a, b, 'edgeType') = method
>      }
>
> The 16k nodes and their edges are from a suitably parsed version of  
> all of the reactions
> reported by KEGG.
>
> Is this user error, user misconception, ... or maybe an inefficiency  
> that future versions
> of the graph package could improve upon?

It looks like fromGXL is doing something quite similar to the for loop
you describe above.  As you have demonstrated, this is not the most
efficient way to construct graph objects.  The immediate reason is
that each call to addNode, addEdge, and edgeData creates a new copy of
the _entire_ graph.

We will look into this and see if we can provide some relief for
fromGXL.  Thanks for the report.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/



More information about the Bioconductor mailing list