[Rd] Revert to R 3.2.x code of logicalSubscript in subscript.c?

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Sun Oct 1 18:39:04 CEST 2017


Currently, in function 'logicalSubscript' in subscript.c, the case of no recycling is handled like the implentation of R function 'which'. It passes through the data only once, but uses more memory. It is since R 3.3.0. For the case of recycling, two passes are done, first to get number of elements in the result.

Also since R 3.3.0, function 'makeSubscript' in subscript.c doesn't call 'duplicate' on logical index vector.

A side note: I guess that it is safe not to call 'duplicate' on logical index vector, even if it is the one being modified in subassignment, because it is converted to positive indices before use in extraction or replacement. If so, isn't it true for character index vector as well?

Here are examples of subsetting a numeric vector of length 10^8 with logical index vector, inspired by Hong Ooi's answer in https://stackoverflow.com/questions/17510778/why-is-subsetting-on-a-logical-type-slower-than-subsetting-on-numeric-type . I presents two extreme cases, each with no-recycling and recycling versions that convert to the same positive indices. Difference between the two versions can be attributed to function 'logicalSubscript'.

Example 1: select none
x <- numeric(1e8)
i <- rep(FALSE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- FALSE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed 
  0.083   0.000   0.083 
   user  system elapsed 
  0.085   0.000   0.085 
   user  system elapsed 
  0.144   0.000   0.144 
   user  system elapsed 
  0.143   0.000   0.144 

Example 2: select all
x <- numeric(1e8)
i <- rep(TRUE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- TRUE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed 
  0.538   0.741   1.292 
   user  system elapsed 
  0.506   0.668   1.175 
   user  system elapsed 
  0.448   0.534   0.986 
   user  system elapsed 
  0.431   0.528   0.960 

The results were from R 3.3.2 on http://rextester.com/l/r_online_compiler . The no-recycling version took longer time than the recycling version for example 2, where more time was taken in both versions.

Function 'logicalSubscript' in subscript.c in R 3.2.x also use a faster code for the case of no recycling, but does two passes in all cases. Treatment for the case of recycling is identical with current code.

Function 'logicalSubscript' in subscript.c affects subsetting with negative indices, because negative indices are converted first to a logical index vector with the same length as the vector (no recycling).

Example, comparing times of x[-1] and its equivalent, x[2:length(x)] :
x <- numeric(1e8)
system.time(x[-1])
system.time(x[-1])
system.time(x[2:length(x)])
system.time(x[2:length(x)])

Output from R 3.3.2 on http://rextester.com/l/r_online_compiler :
   user  system elapsed 
  0.591   0.903   1.515 
   user  system elapsed 
  0.558   0.822   1.384 
   user  system elapsed 
  0.620   0.659   1.285 
   user  system elapsed 
  0.607   0.663   1.274 

Output from R 3.2.2 in Zenppelin Notebook, https://my.datascientistworkbench.com/tools/zeppelin-notebook/ :
user  system elapsed 
  1.156   1.636   2.794 
   user  system elapsed 
  0.884   1.528   2.413 
   user  system elapsed 
  0.932   1.544   2.476 
   user  system elapsed 
  0.932   1.584   2.519

>From above, apparently, x[-1] consistently took longer time than x[2:length(x)] with R 3.3.2, but not with R 3.2.2.

So, how about reverting to R 3.2.x code of function 'logicalSubscript' in subscript.c?



More information about the R-devel mailing list