[R] Fast lookup in ragged array
Peter McMahan
peter.mcmahan at gmail.com
Fri Mar 16 16:42:06 CET 2007
Hello,
I'm running an algorithm for graph structural cohesion that requires
a depth-first search of subgraphs of a rather large network. The
algorithm will necessarily be redundant in the subgraphs it recurses
to, so to speed up the process I implemented a check at each subgraph
to see if it's been searched already.
This algorithm is very slow, and takes days to complete on a graph
with about 200 nodes. It seems that a significant portion of the
computation time is spent looking up the current subgraph in the list
of searched subgraphs to see if it is redundant, and I'm wondering if
there's a faster way to do this.
Currently, the list of searched subgraphs is a list
(`theseSearchedBranches`) of unnamed numeric vectors. I'm checking
against the list using the following command:
if(list(v) %in% theseSearchedBranches){
cat(" Branch already searched: passing.\n\n")
return(NULL)
}
v is a sorted numeric, with length anywhere between 3 and 200.
Because theseSearchedBranches gets quite long as the algorithm
progresses, the %in% command is taking maybe one or two seconds per
lookup.
So to reiterate, I have a list() that looks something like this:
[[1]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]56 72 75 76 85 95 105 110 134 158 159 165 186
[[2]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]56 72 75 76 85 95 105 110 134 147 159 165 186
[[3]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]50 56 72 75 85 95 105 110 134 147 158 159 165 186
...
and so on for tens of thousands of entries, and I am trying to find
some sort of fast equivalent for %in% to search it. I'm also not
adding the vectors to the list in any particular order, as I don't
think %in% would know how to take advantage of that anyway.
Is there a data structure other than list() that I can use that would
be faster? Would it be better to just create a hashed env and add
empty variables named "0.5.6.11.12..."?
I know there are fast lookup algorithms out there that could take
advantage of the fact that the items being searched are indiviually
ordered numeric vectors, but I can't find any info about R
implementations on the mailing lists or help. Is there any data type
that implements a b-tree type of lookup scheme?
Any help would be greatly appreciated.
Thanks,
Peter
More information about the R-help
mailing list