[Rd] Objectsize function visiting every element for alt-rep strings

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jan 21 10:02:03 CET 2019


>>>>> Travers Ching 
>>>>>     on Tue, 15 Jan 2019 12:50:45 -0800 writes:

    > I have a toy alt-rep string package that generates
    > randomly seeded strings.  example: library(altstringisode)
    > x <- altrandomStrings(1e8) head(x) [1]
    > "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1"
    > "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc object.size(1e8)

    > Object.size will call the set_altstring_Elt_method for
    > every single element, materializing (slowly) every element
    > of the vector.  This is a problem mostly in R-studio since
    > object.size is called automatically, defeating the purpose
    > of alt-rep.

Hmm.  But still, the idea had been that object.size()  *shuld*
return the size of the "de-ALTREP'ed" object *but* should not
de-ALTREP it.
That's what happens for integers, but indeed fails to happen for
such as.character(.)ed integers.

>From my eRum presentation (which took from the official ALTREP documentation
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html ) :

  > x <- 1:1e15
  > object.size(x) # 8000'000'000'000'048 bytes : 8000 TBytes -- ok, not really
  8000000000000048 bytes
  > is.unsorted(x) # FALSE : i.e., R's *knows* it is sorted
  [1] FALSE
  > xs <- sort(x)  #
  > .Internal(inspect(x))
  @80255f8 14 REALSXP g0c0 [NAM(7)]  1 : 1000000000000000 (compact)
  > 

  > cx <- as.character(x)
  > .Internal(inspect(cx))
  @80485d8 16 STRSXP g0c0 [NAM(1)]   <deferred string conversion>
    @80255f8 14 REALSXP g1c0 [MARK,NAM(7)]  1 : 1000000000000000 (compact)
  > system.time( print(object.size(x)), gc=FALSE)
  8000000000000048 bytes
     user  system elapsed 
    0.000   0.000   0.001 
  > system.time( print(object.size(cx)), gc=FALSE)
  Error: cannot allocate vector of size 8388608.0 Gb
  Timing stopped at: 11.43 0 11.46
  > 

One could consider it a bug that object.size(cx) is indeed
inspecting every string, i.e., accessing cx[i] for all i.
Note that it is *not*  deALTREPing cx  itself :

> x <- 1:1e6
> cx <- as.character(x)
> .Internal(inspect(cx))

@7f5b1a0 16 STRSXP g0c0 [NAM(1)]   <deferred string conversion>
  @7f5adb0 13 INTSXP g0c0 [NAM(7)]  1 : 1000000 (compact)
> system.time( print(object.size(cx)), gc=FALSE)
64000048 bytes
   user  system elapsed 
  0.369   0.005   0.374 
> .Internal(inspect(cx))
@7f5b1a0 16 STRSXP g0c0 [NAM(7)]   <deferred string conversion>
  @7f5adb0 13 INTSXP g0c0 [NAM(7)]  1 : 1000000 (compact)
> 

    > Is there a way to avoid the problem of forced
    > materialization in rstudio?

    > PS: Is there a way to tell if a post has been received by
    > the mailing list?  How long does it take to show up in the
    > archives?

[ that (waiting time) distribution is quite right skewed... I'd
  guess it's median to be less than 10 minutes... but we had
  artificially delayed it somewhat in the past to fight
  spammers, and ETH (the hosting instituttion) and others have
  increased spam and virus filtering so everything has become
  quite a bit slower ]



More information about the R-devel mailing list