[Rd] translateChar in NewName in bind.c
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Tue Aug 1 18:54:34 CEST 2017
For the 2nd example, I say that R 3.4.1 result is acceptable, as names(c(x)) and names(x) are equal.
The change exposed by the 2nd example is in line with statement of the NEWS item corresponding to PR#17284: "c() and unlist() are now more efficient in constructing the names(.) of their return value, ...." However, currently, the NEWS item is for R-devel, not R 3.4.1 patched.
--------------------------------------------
On Mon, 31/7/17, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Subject: Re: [Rd] translateChar in NewName in bind.c
Cc: r-devel at r-project.org
Date: Monday, 31 July, 2017, 8:38 PM
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Sun, 30 Jul 2017 14:57:53 +0000 writes:
> R devel's bind.c has been ported to R patched. Is it OK while names of 'unlist' or 'c' result may be not strictly the same as in R 3.4.1 because of changed function 'NewName' in bind.c?
> Using 'translateCharUTF8' instead of 'translateChar' is as it should be. It has an effect in non-UTF-8 locale for this example.
> x <- list(1:2)
> names(x) <- "\ue7"
> res <- unlist(x)
> charToRaw(names(res)[1])
> Directly assigning 'tag' to 'ans' is more efficient, but
> may be different from in R 3.4.1 that involves
> 'translateCharUTF8', that is also correct. It has an
> effect for this example.
> x <- 0
> names(x) <- "\xe7"
> Encoding(names(x)) <- "latin1"
> res <- c(x)
> Encoding(names(res))
> charToRaw(names(res))
Yes, you are right, thank you:
That part of the changes in bind.c was *not* directly related to
the two R-bugs (PR#17284 & PR#17292)... and therefore, maybe I
should not have ported it to R-patched (= R 3.4.1 patched).
Your examples above are instructive.. notably the 2nd one seems
to demonstrate to me, that the change also *did* fix a bug:
Encoding(names(res))
is "latin1" in R-devel but interestingly is "UTF-8" in R 3.4.1,
indeed independently of the locale.
I would argue R-devel (and current R-patched) is more faithful
by keeping the Encoding "latin1" that was set for names(x) also
in the names(c(x)) .
I could revert R-patched's bind.c (so it only contains the two
official bug fixes PR#172(84|92) but I wonder if it is
desirable in this case.
I'm glad for further reasoning.
Given current "knowledge"/"evidence", I would not revert
R-patched to R 3.4.1's behavior.
Martin
More information about the R-devel
mailing list