[Rd] Subsetting the "ROW"s of an object

Hervé Pagès hp@ge@ @ending from fredhutch@org
Fri Jun 8 21:13:35 CEST 2018


A missing subscript is still preferable to a TRUE though because it
carries the meaning "take it all". A TRUE also achieves this but via
implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE]
achieve the same thing (if length(x) != 0) and are both no-ops but
the subsetting code gets a chance to immediately and easily detect
the former as a no-op whereas it will probably not be able to do it
so easily for the latter. So in this case it will most likely generate
a copy of 'x' and fill the new array by taking a full walk on it.

H.

On 06/08/2018 11:52 AM, Hadley Wickham wrote:
> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry using ucsd.edu> wrote:
>>
>>
>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <hpages using fredhutch.org> wrote:
>>>
>>> Also the TRUEs cause problems if some dimensions are 0:
>>>
>>>   > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>   Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>     (subscript) logical subscript too long
>>
>> OK. But this is easy enough to handle.
>>
>>>
>>> H.
>>>
>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>> recycling)
>>>> Hadley
>>
>>
>> AFAICS, it is not an issue. Taking
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>
>> as a test case
>>
>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>
>> subset_ROW4 <-
>>       function(x, i, useLiteral=FALSE)
>> {
>>      literal <- quote(x[i,,,,drop=FALSE])
>>      mc <- quote(x[i])
>>      nd <- max(1L, length(dim(x)))
>>      mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>      mc[["drop"]] <- FALSE
>>      if (useLiteral)
>>          eval(literal)
>>      else
>>          eval(mc)
>>   }
>>
>> I get identical times with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>
>> and with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
> 
> I think that's because you used a relatively low precision timing
> mechnaism, and included the index generation in the timing. I see:
> 
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length = 10, by = 100)
> 
> bench::mark(
>    arr[i, TRUE, TRUE, TRUE],
>    arr[i, , , ]
> )
> #> # A tibble: 2 x 1
> #>   expression        min    mean   median      max  n_gc
> #>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2
> 
> So not a huge difference, but it's there.
> 
> Hadley
> 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list