[R] Fast lookup in ragged array
Peter McMahan
peter.mcmahan at gmail.com
Fri Mar 16 17:45:34 CET 2007
Well, I hadn't ever seen RBGL before, so that's great. I've been
using igraph and sna mainly, but there are a few points lacking
between these two. RBGL solves a lot of problems for me!
But I'm not sure it will solve this specific problem. Are you
suggesting I use RBGL to do a depth-first search of all the
subgraphs? For this particular depth-first search I'm not searching
every subgraph, but just those that are constructed from a minimal
cutset of the parent subgraph. At each level of the search, I have to
compute graph cohesion (vertex connectivity), which can take
considerable time. A lot of computation time is saved by only
searching subgraphs obtained through cutsets. So a complete search of
all the subgraphs won't work, but the redundancy I come across is I
think unavoidable.
The particular algorithm I'm trying to implement is Moody and White's
cohesive blocking, in which the end result is a nested set of all
subgraphs with a higher cohesion (connectivity) than their parents.
(see http://www.santafe.edu/research/publications/workingpapers/
00-08-049.pdf )
On Mar 16, 2007, at 11:00 AM, Robert Gentleman wrote:
> why not just use the graph package and RBGL at www.bioconductor.org
>
>
> Peter McMahan wrote:
>
>> Hello,
>> I'm running an algorithm for graph structural cohesion that
>> requires a depth-first search of subgraphs of a rather large
>> network. The algorithm will necessarily be redundant in the
>> subgraphs it recurses to, so to speed up the process I
>> implemented a check at each subgraph to see if it's been searched
>> already.
>> This algorithm is very slow, and takes days to complete on a
>> graph with about 200 nodes. It seems that a significant portion
>> of the computation time is spent looking up the current subgraph
>> in the list of searched subgraphs to see if it is redundant, and
>> I'm wondering if there's a faster way to do this.
>> Currently, the list of searched subgraphs is a list
>> (`theseSearchedBranches`) of unnamed numeric vectors. I'm
>> checking against the list using the following command:
>> if(list(v) %in% theseSearchedBranches){
>> cat(" Branch already searched: passing.\n\n")
>> return(NULL)
>> }
>> v is a sorted numeric, with length anywhere between 3 and 200.
>> Because theseSearchedBranches gets quite long as the algorithm
>> progresses, the %in% command is taking maybe one or two seconds
>> per lookup.
>> So to reiterate, I have a list() that looks something like this:
>> [[1]]
>> [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
>> [18]56 72 75 76 85 95 105 110 134 158 159 165 186
>> [[2]]
>> [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
>> [18]56 72 75 76 85 95 105 110 134 147 159 165 186
>> [[3]]
>> [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
>> [18]50 56 72 75 85 95 105 110 134 147 158 159 165 186
>> ...
>> and so on for tens of thousands of entries, and I am trying to
>> find some sort of fast equivalent for %in% to search it. I'm also
>> not adding the vectors to the list in any particular order, as I
>> don't think %in% would know how to take advantage of that anyway.
>> Is there a data structure other than list() that I can use that
>> would be faster? Would it be better to just create a hashed env
>> and add empty variables named "0.5.6.11.12..."?
>> I know there are fast lookup algorithms out there that could take
>> advantage of the fact that the items being searched are
>> indiviually ordered numeric vectors, but I can't find any info
>> about R implementations on the mailing lists or help. Is there
>> any data type that implements a b-tree type of lookup scheme?
>> Any help would be greatly appreciated.
>> Thanks,
>> Peter
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
More information about the R-help
mailing list