[Rd] [External] Re: Any plans for ALTREP lists (VECSXP)?

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Wed Jul 24 19:43:31 CEST 2019


@Kylie happy to collaborate on it if you're interested.

~G

On Wed, Jul 24, 2019 at 10:43 AM Gabriel Becker <gabembecker using gmail.com>
wrote:

> I can work on this. Thanks Luke.
>
> ~G
>
> On Wed, Jul 24, 2019 at 8:25 AM Tierney, Luke <luke-tierney using uiowa.edu>
> wrote:
>
>> If one of you wanted to try to create a patch to support ALTREP
>> generic vectors here are some notes:
>>
>> The main challenge I am aware of (there might be others): Allowing
>> DATAPTR to return a writable pointer would be too dangerous because
>> the GC write barrier needs to see all mutations. So it would be best
>> if Dataptr and Dataptr_or_null methods were not allowed to be
>> defined. The default methods in altrep.c should do the right think.
>>
>> A reasonable name for the abstract class would be 'altlist'.
>>
>> 'altrep' methods that a class can provide:
>>
>>    Unserialize or UnserializeEX
>>    Serialized_state
>>    Duplicate or DuplicateEx
>>    Coerce
>>    Inspect
>>    Length
>>
>> 'altvec' methods a class should provide:
>>
>>    Extract_subset
>>    not Dataptr
>>    not Dataptr_or_null
>>
>> 'altlist' specific methods:
>>
>>    Elt
>>    Set_elt
>>
>> Best,
>>
>> luke
>>
>> On Tue, 23 Jul 2019, Gabriel Becker wrote:
>>
>> > Hi Kylie,
>> >
>> > Is it a list with only numerics in it? (I only see REALSXPs there, but
>> > obviously inspect isn't showing all of them). If so, you could load it
>> up
>> > into one big vector and then also keep partitioning information around.
>> > Bioconductor does this (see ?IRanges::CompressedList ). The potential
>> > benefit here being that the underlying large vector could then be a big
>> > out-of-memory altrep. How helpful this would be depends somewhat on what
>> > you want to do with it, of course, but it is something that comes to
>> mind.
>> >
>> > Also, I would expect some overhead but that seems like a lot (without
>> > having done super much in the way of benchmarking). What exactly is
>> > as.altrep doing?
>> >
>> > Best,
>> > ~G
>> >
>> > On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
>> > r-devel using r-project.org> wrote:
>> >
>> >> Hi Kylie,
>> >>
>> >> As an alternative in the short term, you could consider deriving from
>> >> S4Vector's List class, implementing the getListElement() method to
>> >> lazily create the objects.
>> >>
>> >> Michael
>> >>
>> >> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <k.bemis using northeastern.edu
>> >
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>> >>>
>> >>> It seems to me that they could be supported in a similar way to how
>> >> ALTSTRING works, with Elt() and Set_elt() methods, or would there be
>> some
>> >> problems with that I’m not seeing due to lists not being atomic
>> vectors?
>> >>>
>> >>> I was taking an approach of converting each list element (of a
>> >> file-based list data structure) to an ALTREP representation to build
>> up an
>> >> “ALTREP list”.
>> >>>
>> >>> This seems fine for shorter lists with large elements, but I noticed
>> >> that for longer lists with smaller elements, this could be far more
>> >> time-consuming than simply reading the entire list into memory and
>> >> returning a non-ALTREP list:
>> >>>
>> >>>> x
>> >>> <34840 length> matter_list :: out-of-memory list
>> >>> (1.1 MB real | 543.3 MB virtual)
>> >>>
>> >>>> system.time(y <- as.list(x))
>> >>>    user  system elapsed
>> >>>   1.116   2.175   5.053
>> >>>
>> >>>> system.time(z <- as.altrep(x))
>> >>>    user  system elapsed
>> >>>  36.295   4.717  41.216
>> >>>
>> >>>> .Internal(inspect(y))
>> >>> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>> >>>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
>> >> 404.093,404.096,404.099,404.102,404.105,...
>> >>>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
>> >> 409.924,409.927,409.931,409.934,409.937,...
>> >>>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
>> >> 400.3,400.303,400.306,400.309,400.312,...
>> >>>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
>> >> 402.179,402.182,402.185,402.188,402.191,...
>> >>>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
>> >> 403.021,403.024,403.027,403.03,403.033,...
>> >>>   ...
>> >>>
>> >>>> .Internal(inspect(z))
>> >>> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>> >>>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=1129, mem=0)
>> >>>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=890, mem=0)
>> >>>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=1878, mem=0)
>> >>>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=2266, mem=0)
>> >>>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=1981, mem=0)
>> >>>   ...
>> >>>
>> >>> In this situation, it would be much faster and simpler for me to
>> return
>> >> a theoretical ALTREP list that serves SEXP elements on-demand, similar
>> to
>> >> how ALTSTRING seems to be implemented.
>> >>>
>> >>> I don’t know how many other people would get a use out of ALTREP
>> lists,
>> >> but I certainly would.
>> >>>
>> >>> Are there any plans for this?
>> >>>
>> >>> Thanks!
>> >>>
>> >>> ~~~
>> >>> Kylie Ariel Bemis
>> >>> Khoury College of Computer Sciences
>> >>> Northeastern University
>> >>> kuwisdelu.github.io<https://kuwisdelu.github.io>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-devel using r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >>
>> >>
>> >> --
>> >> Michael Lawrence
>> >> Scientist, Bioinformatics and Computational Biology
>> >> Genentech, A Member of the Roche Group
>> >> Office +1 (650) 225-7760
>> >> michafla using gene.com
>> >>
>> >> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>> >>
>> >> ______________________________________________
>> >> R-devel using r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>     Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list