[Rd] Revert to R 3.2.x code of logicalSubscript in subscript.c?
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Sun Oct 1 18:39:04 CEST 2017
Currently, in function 'logicalSubscript' in subscript.c, the case of no recycling is handled like the implentation of R function 'which'. It passes through the data only once, but uses more memory. It is since R 3.3.0. For the case of recycling, two passes are done, first to get number of elements in the result.
Also since R 3.3.0, function 'makeSubscript' in subscript.c doesn't call 'duplicate' on logical index vector.
A side note: I guess that it is safe not to call 'duplicate' on logical index vector, even if it is the one being modified in subassignment, because it is converted to positive indices before use in extraction or replacement. If so, isn't it true for character index vector as well?
Here are examples of subsetting a numeric vector of length 10^8 with logical index vector, inspired by Hong Ooi's answer in https://stackoverflow.com/questions/17510778/why-is-subsetting-on-a-logical-type-slower-than-subsetting-on-numeric-type . I presents two extreme cases, each with no-recycling and recycling versions that convert to the same positive indices. Difference between the two versions can be attributed to function 'logicalSubscript'.
Example 1: select none
x <- numeric(1e8)
i <- rep(FALSE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- FALSE# recycling
system.time(x[i])
system.time(x[i])
Output:
user system elapsed
0.083 0.000 0.083
user system elapsed
0.085 0.000 0.085
user system elapsed
0.144 0.000 0.144
user system elapsed
0.143 0.000 0.144
Example 2: select all
x <- numeric(1e8)
i <- rep(TRUE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- TRUE# recycling
system.time(x[i])
system.time(x[i])
Output:
user system elapsed
0.538 0.741 1.292
user system elapsed
0.506 0.668 1.175
user system elapsed
0.448 0.534 0.986
user system elapsed
0.431 0.528 0.960
The results were from R 3.3.2 on http://rextester.com/l/r_online_compiler . The no-recycling version took longer time than the recycling version for example 2, where more time was taken in both versions.
Function 'logicalSubscript' in subscript.c in R 3.2.x also use a faster code for the case of no recycling, but does two passes in all cases. Treatment for the case of recycling is identical with current code.
Function 'logicalSubscript' in subscript.c affects subsetting with negative indices, because negative indices are converted first to a logical index vector with the same length as the vector (no recycling).
Example, comparing times of x[-1] and its equivalent, x[2:length(x)] :
x <- numeric(1e8)
system.time(x[-1])
system.time(x[-1])
system.time(x[2:length(x)])
system.time(x[2:length(x)])
Output from R 3.3.2 on http://rextester.com/l/r_online_compiler :
user system elapsed
0.591 0.903 1.515
user system elapsed
0.558 0.822 1.384
user system elapsed
0.620 0.659 1.285
user system elapsed
0.607 0.663 1.274
Output from R 3.2.2 in Zenppelin Notebook, https://my.datascientistworkbench.com/tools/zeppelin-notebook/ :
user system elapsed
1.156 1.636 2.794
user system elapsed
0.884 1.528 2.413
user system elapsed
0.932 1.544 2.476
user system elapsed
0.932 1.584 2.519
>From above, apparently, x[-1] consistently took longer time than x[2:length(x)] with R 3.3.2, but not with R 3.2.2.
So, how about reverting to R 3.2.x code of function 'logicalSubscript' in subscript.c?
More information about the R-devel
mailing list