[Rd] performance of vector subscripting via character index

Mon Sep 7 21:03:04 CEST 2009

Hi all,

Using character indexing on a vector is quite fast up through vector 
length of 46340, then it suddenly it gets 3 orders of magnitude slower. 
This is true at least of the special case in which the index vector is 
the complete (though possibly out-of-order) set of vector names:

test <- function(n) {
     vec <- seq_len(n)
     names(vec) <- as.character(vec)
     ind <- rev(names(vec))
     system.time(vec[ind])
}

test(46340)
##    user  system elapsed
##   0.012   0.000   0.009
test(46341)
##    user  system elapsed
##  11.805   0.000  11.805

There seems to be a rebound at just over twice the value of the 
threshold above, though I'll admit I didn't have the stamina to test all 
values in between:

test(92689)
##    user  system elapsed
##  48.951   0.000  48.946
test(92690)
##    user  system elapsed
##   0.036   0.000   0.038

And then worse again...

test(139022)
##    user  system elapsed
##   0.068   0.003   0.071
test(139023)
##    user  system elapsed
## 114.239   0.000 114.279

I see this on both Ubuntu 9.04 and OS X 10.6, using R 2.9.2 in both 
cases. Has this behavior already been identified? Using 'match' instead 
of direct character indexing is a serviceable workaround ... is it in 
fact the recommended approach in this case?

Thanks,
Jim

------------------------------
James Regetz, Ph.D.
Scientific Programmer/Analyst
National Center for Ecological Analysis & Synthesis
735 State St, Suite 300
Santa Barbara, CA 93101