[Rd] Objectsize function visiting every element for alt-rep strings
Gabriel Becker
g@bembecker @end|ng |rom gm@||@com
Fri Jan 18 23:49:21 CET 2019
Travers,
Great to hear you're trying out the ALTREP stuff, good on you :).
Did you mean the get_altstring_Elt_method? I see the code in size.c within
utils that grabs each element, but I don't see any setting (and the setters
are noops currently anyway they just do things the old way).
One thing we have to decide is what object.size means for an altrep. I tend
to think it should mean the size of the alternative representation
currently in use in memory, but I see that a small note in ?object.size
indicates that size of objects with compact internal representations may be
overestimated, so technically this is "as currently documented". The "we"
here, of course, is the R-core team so we'll have to see how they feel on
the matter.
As for what to do about it, one possibility is to add an object.size method
to the ALTREP method table that gets called if object.size is called on an
ALTREP object. In this case, it would be up to the class to define an
appropriate object.size method. That would be relatively easy to do from a
technical standpoint on R's side, but what comes out of object.size would
be a bit "Wild West-y", without the consistency and correctness guarantees
one might expect from a function in utils.
Another option is to to have object.size recurse to calling object.size on
the two parts (SEXPS which together make up a CONS cell, I believe) that
make up an ALTREP internally. Roughly speaking one of these is usually the
alternative representation while the other is the spot to put an object
with the traditional representation if the payload is ever fully
materialized in an altrep-unsafe way - e.g., C code grabs a writable
dataptr via INTEGER, REAL, DATAPTR, etc. Note there are exceptions to what
I said above, though,such as the wrapper ALTREP classes which always have
the parent object (typically a traditionally laid-out vector), because the
"alternative representation" part is strictly a metadata annotation in that
case and contains no representation of the payload data for those classes.
In this second case the result of object.size would be consistent across
all ALTREP classes, but in both cases the result of object.size would no
longer give any information about the size of a vector *payload*. This is
consistent with how object.size deals with external pointers now, but could
lead to some surprise in the case of vectors which the end user may not
even know are ALTREPs.
Thoughts from anyone else on this list?
Anyway, thanks for pointing this out. I'll talk with Luke and see what
makes sense to do here.
Best,
~G
On Wed, Jan 16, 2019 at 3:49 AM Travers Ching <traversc using gmail.com> wrote:
> I have a toy alt-rep string package that generates randomly seeded strings.
>
> example:
> library(altstringisode)
> x <- altrandomStrings(1e8)
> head(x)
> [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
> etc
> object.size(1e8)
>
> Object.size will call the set_altstring_Elt_method for every single
> element, materializing (slowly) every element of the vector. This is
> a problem mostly in R-studio since object.size is called
> automatically, defeating the purpose of alt-rep.
>
> Is there a way to avoid the problem of forced materialization in rstudio?
>
> PS: Is there a way to tell if a post has been received by the mailing
> list? How long does it take to show up in the archives?
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list