[Rd] performance of vector subscripting via character index
Jim Regetz
regetz at nceas.ucsb.edu
Mon Sep 7 21:03:04 CEST 2009
Hi all,
Using character indexing on a vector is quite fast up through vector
length of 46340, then it suddenly it gets 3 orders of magnitude slower.
This is true at least of the special case in which the index vector is
the complete (though possibly out-of-order) set of vector names:
test <- function(n) {
vec <- seq_len(n)
names(vec) <- as.character(vec)
ind <- rev(names(vec))
system.time(vec[ind])
}
test(46340)
## user system elapsed
## 0.012 0.000 0.009
test(46341)
## user system elapsed
## 11.805 0.000 11.805
There seems to be a rebound at just over twice the value of the
threshold above, though I'll admit I didn't have the stamina to test all
values in between:
test(92689)
## user system elapsed
## 48.951 0.000 48.946
test(92690)
## user system elapsed
## 0.036 0.000 0.038
And then worse again...
test(139022)
## user system elapsed
## 0.068 0.003 0.071
test(139023)
## user system elapsed
## 114.239 0.000 114.279
I see this on both Ubuntu 9.04 and OS X 10.6, using R 2.9.2 in both
cases. Has this behavior already been identified? Using 'match' instead
of direct character indexing is a serviceable workaround ... is it in
fact the recommended approach in this case?
Thanks,
Jim
------------------------------
James Regetz, Ph.D.
Scientific Programmer/Analyst
National Center for Ecological Analysis & Synthesis
735 State St, Suite 300
Santa Barbara, CA 93101
More information about the R-devel
mailing list