[Rd] [External] Re: Any plans for ALTREP lists (VECSXP)?

Tierney, Luke |uke-t|erney @end|ng |rom u|ow@@edu
Wed Jul 24 17:25:01 CEST 2019


If one of you wanted to try to create a patch to support ALTREP
generic vectors here are some notes:

The main challenge I am aware of (there might be others): Allowing
DATAPTR to return a writable pointer would be too dangerous because
the GC write barrier needs to see all mutations. So it would be best
if Dataptr and Dataptr_or_null methods were not allowed to be
defined. The default methods in altrep.c should do the right think.

A reasonable name for the abstract class would be 'altlist'.

'altrep' methods that a class can provide:

   Unserialize or UnserializeEX
   Serialized_state
   Duplicate or DuplicateEx
   Coerce
   Inspect
   Length

'altvec' methods a class should provide:

   Extract_subset
   not Dataptr
   not Dataptr_or_null

'altlist' specific methods:

   Elt
   Set_elt

Best,

luke

On Tue, 23 Jul 2019, Gabriel Becker wrote:

> Hi Kylie,
>
> Is it a list with only numerics in it? (I only see REALSXPs there, but
> obviously inspect isn't showing all of them). If so, you could load it up
> into one big vector and then also keep partitioning information around.
> Bioconductor does this (see ?IRanges::CompressedList ). The potential
> benefit here being that the underlying large vector could then be a big
> out-of-memory altrep. How helpful this would be depends somewhat on what
> you want to do with it, of course, but it is something that comes to mind.
>
> Also, I would expect some overhead but that seems like a lot (without
> having done super much in the way of benchmarking). What exactly is
> as.altrep doing?
>
> Best,
> ~G
>
> On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
> r-devel using r-project.org> wrote:
>
>> Hi Kylie,
>>
>> As an alternative in the short term, you could consider deriving from
>> S4Vector's List class, implementing the getListElement() method to
>> lazily create the objects.
>>
>> Michael
>>
>> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <k.bemis using northeastern.edu>
>> wrote:
>>>
>>> Hello,
>>>
>>> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>>>
>>> It seems to me that they could be supported in a similar way to how
>> ALTSTRING works, with Elt() and Set_elt() methods, or would there be some
>> problems with that I’m not seeing due to lists not being atomic vectors?
>>>
>>> I was taking an approach of converting each list element (of a
>> file-based list data structure) to an ALTREP representation to build up an
>> “ALTREP list”.
>>>
>>> This seems fine for shorter lists with large elements, but I noticed
>> that for longer lists with smaller elements, this could be far more
>> time-consuming than simply reading the entire list into memory and
>> returning a non-ALTREP list:
>>>
>>>> x
>>> <34840 length> matter_list :: out-of-memory list
>>> (1.1 MB real | 543.3 MB virtual)
>>>
>>>> system.time(y <- as.list(x))
>>>    user  system elapsed
>>>   1.116   2.175   5.053
>>>
>>>> system.time(z <- as.altrep(x))
>>>    user  system elapsed
>>>  36.295   4.717  41.216
>>>
>>>> .Internal(inspect(y))
>>> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>>>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
>> 404.093,404.096,404.099,404.102,404.105,...
>>>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
>> 409.924,409.927,409.931,409.934,409.937,...
>>>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
>> 400.3,400.303,400.306,400.309,400.312,...
>>>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
>> 402.179,402.182,402.185,402.188,402.191,...
>>>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
>> 403.021,403.024,403.027,403.03,403.033,...
>>>   ...
>>>
>>>> .Internal(inspect(z))
>>> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>>>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1129, mem=0)
>>>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=890, mem=0)
>>>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1878, mem=0)
>>>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=2266, mem=0)
>>>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1981, mem=0)
>>>   ...
>>>
>>> In this situation, it would be much faster and simpler for me to return
>> a theoretical ALTREP list that serves SEXP elements on-demand, similar to
>> how ALTSTRING seems to be implemented.
>>>
>>> I don’t know how many other people would get a use out of ALTREP lists,
>> but I certainly would.
>>>
>>> Are there any plans for this?
>>>
>>> Thanks!
>>>
>>> ~~~
>>> Kylie Ariel Bemis
>>> Khoury College of Computer Sciences
>>> Northeastern University
>>> kuwisdelu.github.io<https://kuwisdelu.github.io>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>> --
>> Michael Lawrence
>> Scientist, Bioinformatics and Computational Biology
>> Genentech, A Member of the Roche Group
>> Office +1 (650) 225-7760
>> michafla using gene.com
>>
>> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the R-devel mailing list