[Rd] Revert to R 3.2.x code of logicalSubscript in subscript.c?
Radford Neal
radford at cs.toronto.edu
Tue Oct 3 02:00:25 CEST 2017
Suharto,
If you're interested in performance with subscripting, you might want
to look at pqR (pqR-project.org). It has some substantial performance
improvements for subscripting over R Core versions. This is
especially true for the current development version of pqR (probably
leading to a new release in about a month).
You can look at a somewhat-stable snapshot of recent pqR development at
https://github.com/radfordneal/pqR/tree/05e32fa6
In particular, src/main/subscript.c might be of interest.
Note that you should read mods-dir/README if you want to build this,
and in particular, you need to run create-configure in the top-level
source directory first.
I modified your tests a bit, including producing versions using both
vectors of length 1e8 like you did (which will not fit in cache) and
vectors of length 1e5 (which will fit in at least the L3 cache). I
ran tests on an Intel Skylake processor (E3-1270v5 @ 3.6GHz), using
gcc 7.2 with -O3 -march=native -mtune=native.
I got the following results with R-3.4.2 (with R_ENABLE_JIT=0, which
is slightly faster than using the JIT compiler):
R-3.4.2, LARGE VECTORS:
> N <- 1e8; R <- 5
> #N <- 1e5; R <- 1000
>
> x <- numeric(N)
> i <- rep(FALSE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.296 0.000 0.297
> i <- FALSE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.416 0.000 0.418
>
> x <- numeric(N)
> i <- rep(TRUE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
1.416 0.352 1.773
> i <- TRUE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
1.348 0.264 1.613
>
> x <- numeric(N)
> system.time(for (r in 1:R) a <- x[-1])
user system elapsed
1.516 0.376 1.895
> system.time(for (r in 1:R) a <- x[2:length(x)])
user system elapsed
1.516 0.308 1.827
>
> v <- 2:length(x)
> system.time(for (r in 1:R) a <- x[v])
user system elapsed
1.416 0.268 1.689
R-3.4.2, SMALL VECTORS:
> #N <- 1e8; R <- 5
> N <- 1e5; R <- 1000
>
> x <- numeric(N)
> i <- rep(FALSE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.088 0.000 0.089
> i <- FALSE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.084 0.000 0.084
>
> x <- numeric(N)
> i <- rep(TRUE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.492 0.020 0.515
> i <- TRUE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.408 0.008 0.420
>
> x <- numeric(N)
> system.time(for (r in 1:R) a <- x[-1])
user system elapsed
0.508 0.004 0.516
> system.time(for (r in 1:R) a <- x[2:length(x)])
user system elapsed
0.464 0.008 0.473
>
> v <- 2:length(x)
> system.time(for (r in 1:R) a <- x[v])
user system elapsed
0.428 0.000 0.428
Here are the results with the development version of pqR (uncompressed
pointers, no byte compilation):
pqR (devel), LARGE VECTORS:
> N <- 1e8; R <- 5
> #N <- 1e5; R <- 1000
>
> x <- numeric(N)
> i <- rep(FALSE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.192 0.000 0.193
> i <- FALSE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.436 0.000 0.434
>
> x <- numeric(N)
> i <- rep(TRUE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.768 0.216 0.988
> i <- TRUE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.832 0.272 1.105
>
> x <- numeric(N)
> system.time(for (r in 1:R) a <- x[-1])
user system elapsed
0.280 0.156 0.435
> system.time(for (r in 1:R) a <- x[2:length(x)])
user system elapsed
0.252 0.184 0.436
>
> v <- 2:length(x)
> system.time(for (r in 1:R) a <- x[v])
user system elapsed
0.828 0.168 0.998
pqR (devel), SMALL VECTORS:
> #N <- 1e8; R <- 5
> N <- 1e5; R <- 1000
>
> x <- numeric(N)
> i <- rep(FALSE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.040 0.000 0.038
> i <- FALSE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.084 0.000 0.087
>
> x <- numeric(N)
> i <- rep(TRUE, length(x))# no reycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.156 0.036 0.192
> i <- TRUE# recycling
> system.time(for (r in 1:R) a <- x[i])
user system elapsed
0.184 0.012 0.195
>
> x <- numeric(N)
> system.time(for (r in 1:R) a <- x[-1])
user system elapsed
0.060 0.012 0.075
> system.time(for (r in 1:R) a <- x[2:length(x)])
user system elapsed
0.052 0.024 0.075
>
> v <- 2:length(x)
> system.time(for (r in 1:R) a <- x[v])
user system elapsed
0.180 0.004 0.182
Summarizing elapsed times:
LARGE VECTORS T1 T2 T3 T4 T5 T6 T7
R-3.4.2: 0.297 0.418 1.773 1.613 1.895 1.827 1.689
pqR dev: 0.193 0.434 0.988 1.105 0.435 0.436 0.998
SMALL VECTORS T1 T2 T3 T4 T5 T6 T7
R-3.4.2: 0.089 0.084 0.515 0.420 0.516 0.473 0.428
pqR dev: 0.038 0.087 0.192 0.195 0.075 0.075 0.182
As one can see, pqR is substantially faster for all except T2 (where
it's about the same). The very large advantage of pqR on T5 and T6 is
partly because pqR has special code for efficiently handling things
like x[-1] and x[2:length(x)], so I added the x[v] test to see what
performance is like when this special handling isn't invoked.
There's no particular reason pqR's code for these operations couldn't
be adapted for use in the R Core implementaton, though there are
probably a few issues involving large vectors, and the special
handling of x[2:length(x)] would require implementing pqR's internal
"variant result" mechanism. pqR also has much faster code for some
other subset and subset assignment operations.
Radford Neal
More information about the R-devel
mailing list