[Rd] Subsetting the "ROW"s of an object
Michael Lawrence
l@wrence@mich@el @ending from gene@com
Fri Jun 8 22:56:36 CEST 2018
Actually, it's sort of the opposite. Everything becomes a sequence of
integers internally, even when the argument is missing. So the same
amount of work is done, basically. ALTREP will let us improve this
sort of thing.
Michael
On Fri, Jun 8, 2018 at 1:49 PM, Hadley Wickham <h.wickham using gmail.com> wrote:
> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:
>
> dims <- c(4, 4, 4, 1e5)
>
> arr <- array(rnorm(prod(dims)), dims)
> dim(arr)
> #> [1] 4 4 4 100000
> i <- c(1, 3)
>
> bench::mark(
> arr[i, TRUE, TRUE, TRUE],
> arr[i, , , ]
> )[c("expression", "min", "mean", "max")]
> #> # A tibble: 2 x 4
> #> expression min mean max
> #> <chr> <bch:tm> <bch:tm> <bch:tm>
> #> 1 arr[i, TRUE, TRUE, TRUE] 41.8ms 43.6ms 46.5ms
> #> 2 arr[i, , , ] 41.7ms 43.1ms 46.3ms
>
>
> On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <ccberry using ucsd.edu> wrote:
>>
>>
>>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <h.wickham using gmail.com> wrote:
>>>
>>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry using ucsd.edu> wrote:
>>>>
>>>>
>>>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <hpages using fredhutch.org> wrote:
>>>>>
>>>>> Also the TRUEs cause problems if some dimensions are 0:
>>>>>
>>>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>>> (subscript) logical subscript too long
>>>>
>>>> OK. But this is easy enough to handle.
>>>>
>>>>>
>>>>> H.
>>>>>
>>>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>>>> recycling)
>>>>>> Hadley
>>>>
>>>>
>>>> AFAICS, it is not an issue. Taking
>>>>
>>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>>
>>>> as a test case
>>>>
>>>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>>>
>>>> subset_ROW4 <-
>>>> function(x, i, useLiteral=FALSE)
>>>> {
>>>> literal <- quote(x[i,,,,drop=FALSE])
>>>> mc <- quote(x[i])
>>>> nd <- max(1L, length(dim(x)))
>>>> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>> mc[["drop"]] <- FALSE
>>>> if (useLiteral)
>>>> eval(literal)
>>>> else
>>>> eval(mc)
>>>> }
>>>>
>>>> I get identical times with
>>>>
>>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>>
>>>> and with
>>>>
>>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>>
>>> I think that's because you used a relatively low precision timing
>>> mechnaism, and included the index generation in the timing. I see:
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length = 10, by = 100)
>>>
>>> bench::mark(
>>> arr[i, TRUE, TRUE, TRUE],
>>> arr[i, , , ]
>>> )
>>> #> # A tibble: 2 x 1
>>> #> expression min mean median max n_gc
>>> #> <chr> <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
>>> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2
>>> #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2
>>>
>>> So not a huge difference, but it's there.
>>
>>
>> Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.
>>
>> But with subset_ROW4 I see no consistent difference.
>>
>> In this example, it runs faster on average using `eval(mc)' to return the result:
>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length=10,by=100)
>>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
>> # A tibble: 2 x 8
>> expression min mean median max `itr/sec` mem_alloc n_gc
>> <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
>> 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms 28686. 5.05KB 5
>> 2 subset_ROW4(arr, i, TRUE) 28.9µs 35µs 32.4µs 875.11µs 28572. 5.05KB 5
>>>
>>
>> And on subsequent reps the lead switches back and forth.
>>
>>
>> Chuck
>>
>
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list