[Rd] Subsetting the "ROW"s of an object
Berry, Charles
ccberry @ending from uc@d@edu
Fri Jun 8 21:31:51 CEST 2018
> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <h.wickham using gmail.com> wrote:
>
> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry using ucsd.edu> wrote:
>>
>>
>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <hpages using fredhutch.org> wrote:
>>>
>>> Also the TRUEs cause problems if some dimensions are 0:
>>>
>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>> (subscript) logical subscript too long
>>
>> OK. But this is easy enough to handle.
>>
>>>
>>> H.
>>>
>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>> recycling)
>>>> Hadley
>>
>>
>> AFAICS, it is not an issue. Taking
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>
>> as a test case
>>
>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>
>> subset_ROW4 <-
>> function(x, i, useLiteral=FALSE)
>> {
>> literal <- quote(x[i,,,,drop=FALSE])
>> mc <- quote(x[i])
>> nd <- max(1L, length(dim(x)))
>> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>> mc[["drop"]] <- FALSE
>> if (useLiteral)
>> eval(literal)
>> else
>> eval(mc)
>> }
>>
>> I get identical times with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>
>> and with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>
> I think that's because you used a relatively low precision timing
> mechnaism, and included the index generation in the timing. I see:
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length = 10, by = 100)
>
> bench::mark(
> arr[i, TRUE, TRUE, TRUE],
> arr[i, , , ]
> )
> #> # A tibble: 2 x 1
> #> expression min mean median max n_gc
> #> <chr> <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2
> #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2
>
> So not a huge difference, but it's there.
Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.
But with subset_ROW4 I see no consistent difference.
In this example, it runs faster on average using `eval(mc)' to return the result:
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length=10,by=100)
> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
# A tibble: 2 x 8
expression min mean median max `itr/sec` mem_alloc n_gc
<chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms 28686. 5.05KB 5
2 subset_ROW4(arr, i, TRUE) 28.9µs 35µs 32.4µs 875.11µs 28572. 5.05KB 5
>
And on subsequent reps the lead switches back and forth.
Chuck
More information about the R-devel
mailing list