[Rd] Invisible names problem
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Wed Jul 22 23:29:38 CEST 2020
On 22/07/2020 3:29 p.m., Pan Domu wrote:
> I ran into strange behavior when removing names.
>
> Two ways of removing names:
>
> i <- rep(1:4, length.out=20000)
> k <- c(a=1, b=2, c=3, d=4)
>
> x1 <- unname(k[i])
> x2 <- k[i]
> x2 <- unname(x2)
>
> Are they identical?
>
> identical(x1,x2) # TRUE
>
> but no
>
> identical(serialize(x1,NULL),serialize(x2,NULL)) # FALSE
>
> But problem is with serialization type 3, cause:
>
> identical(serialize(x1,NULL,version = 2),serialize(x2,NULL,version =
> 2)) # TRUE
>
> It seems that the second one keeps names somewhere invisibly.
>
> Some function can lost them, e.g. head:
>
> identical(serialize(head(x1, 20001),NULL),serialize(head(x2,
> 20001),NULL)) # TRUE
>
> But not saveRDS (so files are bigger), tibble family keeps them but base
> data.frame seems to drop them.
>
> From my test invisible names are in following cases:
>
> x1 <- k[i] %>% unname()
> x3 <- k[i]; x3 <- unname(x3)
> x5 <- k[i]; x5 <- `names<-`(x5, NULL)
> x6 <- k[i]; x6 <- unname(x6)
>
> but not in this one
> x2 <- unname(k[i])
> x4 <- k[i]; names(x4) <- NULL
>
> What kind of magick is that?
>
> It hits us when we upgrade from 3.5 (when serialization changed) and had
> impact on parallelization (cause serialized objects were bigger).
You can use .Internal(inspect(x1)) and .Internal(inspect(x2)) to see
that the two objects are not identical:
> .Internal(inspect(x1))
@1116b7000 14 REALSXP g0c7 [REF(2)] (len=20000, tl=0) 1,2,3,4,1,...
> .Internal(inspect(x2))
@7f9c77664ce8 14 REALSXP g0c0 [REF(2)] wrapper [srt=-2147483648,no_na=0]
@10e7b7000 14 REALSXP g0c7 [REF(6),ATT] (len=20000, tl=0) 1,2,3,4,1,...
ATTRIB:
@7f9c77664738 02 LISTSXP g0c0 [REF(1)]
TAG: @7f9c6c027890 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000]
"names" (has value)
@10e3ac000 16 STRSXP g0c7 [REF(65535)] (len=20000, tl=0)
@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII]
[cached] "a"
@7f9c6ae9a678 09 CHARSXP g1c1 [MARK,REF(10013),gp=0x61] [ASCII]
[cached] "b"
@7f9c6c0496c0 09 CHARSXP g1c1 [MARK,REF(10568),gp=0x61,ATT] [ASCII]
[cached] "c"
@7f9c6ad3df40 09 CHARSXP g1c1 [MARK,REF(10029),gp=0x61,ATT] [ASCII]
[cached] "d"
@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII]
[cached] "a"
...
It looks as though x2 is a tiny ALTREP object acting as a wrapper on the
original k[i], but I might be misinterpreting those displays. I don't
know how to force ALTREP objects to standard representation:
unserializing the serialized x2 gives something like x2, not like x1.
Maybe you want to look at one of the contributed low level packages.
The stringfish package has a "materialize" function that is advertised
to convert anything to standard format, but it doesn't change x2.
Duncan Murdoch
More information about the R-devel
mailing list