[Rd] Clearing attributes returns ALTREP, serialize still saves them

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Sat Jul 3 07:18:34 CEST 2021


Hi all,

I don't have a solution yet, but a bit more here:

> .Internal(inspect(x2b))

@7f913826d590 14 REALSXP g0c0 [REF(1)]  wrapper [srt=-2147483648,no_na=0]

  @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0)
0.45384,0.926371,0.838637,-1.71485,-0.719073,...

  ATTRIB:

    @7f913826dc20 02 LISTSXP g0c0 [REF(1)]

      TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(460)] "data"

      @7f9118310000 14 REALSXP g0c7 [REF(2)] (len=1000000, tl=0)
0.66682,0.480576,-1.13229,0.453313,-0.819498,...

> attr(x2b, "data") <- "small"

> .Internal(inspect(x2b))

@7f913826d590 14 REALSXP g0c0 [REF(1),ATT]  wrapper
[srt=-2147483648,no_na=0]

  @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0)
0.45384,0.926371,0.838637,-1.71485,-0.719073,...

  ATTRIB:

    @7f913826dc20 02 LISTSXP g0c0 [REF(1)]

      TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data"

      @7f9118310000 14 REALSXP g0c7 [REF(2)] (len=1000000, tl=0)
0.66682,0.480576,-1.13229,0.453313,-0.819498,...

ATTRIB:

  @7f913826c870 02 LISTSXP g0c0 [REF(1)]

    TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data"

    @7f9120580850 16 STRSXP g0c1 [REF(3)] (len=1, tl=0)

      @7f91205808c0 09 CHARSXP g0c1 [REF(3),gp=0x60] [ASCII] [cached]
"small"


So we can see that the assignment of attr(x2b, "data") IS doing something,
but it isn't doing the right thing. The fact that the above code assigned
null instead of a value was hiding this.


I will dig into this more if someone doesn't get it fixed before me, but it
won't be until after useR, because I'm preparing multiple talks for that
and it is this coming week.


Best,

~G

On Fri, Jul 2, 2021 at 9:15 PM Zafer Barutcuoglu <
zafer.barutcuoglu using gmail.com> wrote:

> Hi all,
>
> Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP
> wrapper which internally still contains the names/dimnames, and calling
> base::serialize on the result writes them out. They are unserialized in the
> same way, with the names/dimnames hidden in the ALTREP wrapper, so the
> problem is not obvious except in wasted time, bandwidth, or disk space.
>
> Example:
>    v1 <- setNames(rnorm(64), paste("element name", 1:64))
>    v2 <- unname(v1)
>    names(v2)
>    # NULL
>    length(serialize(v1, NULL))
>    # [1] 2039
>    length(serialize(v2, NULL))
>    # [1] 2132
>    length(serialize(v2[TRUE], NULL))
>    # [1] 543
>
>    con <- rawConnection(raw(), "w")
>    serialize(v2, con)
>    v3 <- unserialize(rawConnectionValue(con))
>    names(v3)
>    # NULL
>    length(serialize(v3, NULL))
>    # 2132
>
>    # Similarly for matrices:
>    m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8),
> paste("col name", 1:8)))
>    m2 <- unname(m1)
>    dimnames(m2)
>    # NULL
>    length(serialize(m1, NULL))
>    # [1] 918
>    length(serialize(m2, NULL))
>    # [1] 1035
>    length(serialize(m2[TRUE, TRUE], NULL))
>    # 582
>
> Previously discussed here, too:
> https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html
>
> This happens with other attributes as well, but less predictably:
>    x1 <- structure(rnorm(100), data=rnorm(1000000))
>    x2 <- structure(x1, data=NULL)
>    length(serialize(x1, NULL))
>    # [1] 8000952
>    length(serialize(x2, NULL))
>    # [1] 924
>
>    x1b <- rnorm(100)
>    attr(x1b, "data") <- rnorm(1000000)
>    x2b <- x1b
>    attr(x2b, "data") <- NULL
>    length(serialize(x1b, NULL))
>    # [1] 8000863
>    length(serialize(x2b, NULL))
>    # [1] 8000956
>
> This is pretty severe, trying to track down why serializing a small object
> kills the network, because of which large attributes it may have once had
> during its lifetime around the codebase that are still secretly tagging
> along.
>
> Is there a plan to resolve this? Any suggestions for maybe a C++
> workaround until then? Or an alternative performant serialization solution?
>
> Best,
> --
> Zafer
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list