[BioC] slow graph methods [was: 'slow insertions in to graphNEL object (24 hours for 16k nodes)']

Paul Shannon pshannon at systemsbiology.org
Wed Sep 26 21:44:40 CEST 2007


A couple of weeks ago, just before he left, Seth made a striking (800x)
improvement to the speed of graph::fromGXL.  This has been a big help.

However, I'd like to construct graphs incrementally, and  
programmatically,
without having to create new gxl files.  Is it possible that
Seth's efficiencies have not yet been applied to other methods in the  
graph class?

In particular, it seems that the following operations take a longish  
time
on graphs of a few thousand nodes:

   edgeData
   nodeData
   addEdge
   addNode

I could provide actual times and test cases if that would help.

Thanks!

  - Paul



On Sep 12, 2007, at 8:09 AM, Seth Falcon wrote:

> Hi again,
>
> Seth Falcon <sfalcon at FHCRC.ORG> writes:
>> Paul Shannon <pshannon at systemsbiology.org> writes:
>>> It took nearly 24 hours (!) to create a 16k node graph using two
>>> different techniques:
>>>
>>>     g = fromGXL (file ('someFile.gxl'))
>
> Using a patched version of fromGXL (on my laptop) I get:
>
>> library(graph)
>> con = file("kegg.yeast.gxl", open="r")
>> system.time(z <- fromGXL(con))
>        user  system elapsed
>     104.366   0.570 105.070
>> z
>     A graphNEL graph with undirected edges
>     Number of Nodes = 15158
>     Number of Edges = 32668
>> validObject(z)
>     [1] TRUE
>
> That's over 800x faster :-)
>
> I've checked in the changes in devel as graph 1.15.17.  You can get it
> from svn or wait till this time tomorrow (in either case, you will
> need R 2.6.0 alpha).
>
> The code passes the existing unit tests, but some extra scrutiny that
> the returned graphs are as desired is quite welcome.
>
> + seth
>
> -- 
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
> Research Center
> BioC: http://bioconductor.org/
> Blog: http://userprimary.net/user/



More information about the Bioconductor mailing list