[Rd] special latin1 do not print as glyphs in current devel on windows
Daniel Possenriede
possenriede at gmail.com
Tue Aug 1 14:49:21 CEST 2017
Thank you!. My apologies again for not including the console output in my
message before. I sent another e-mail with the output in the meantime, so
it should be a bit clearer now, what I am seeing. In case I missed
something, please let me know.
Yes, I am using latin1 and cp1252 interchangebly here, mostly because
Encoding() is reporting the encoding as "latin1". You presumed correctly
that my current/default locale's encoding is CP1252. (I also mentioned that
my locale is LC_COLLATE=German_Germany.1252 before).
As you are changing encodings, you do not want to preserve encoding!
>
I am not interested in preserving encodings. What I am worried about is
that the encoding is not marked anymore, i.e. that Encoding() returns
"unknown".
In cp1252 encoding on Windows (note that I am using the cp1252 escape
"\x80" and not the Unicode "\u20AC")
> x_utf8 <- enc2utf8(c("€", "\x80"))
> Encoding(x_utf8)
[1] "UTF-8" "UTF-8"
> x_nat <- enc2native(x_utf8)
> Encoding(x_nat)
[1] "unknown" "unknown"
See also Kirill's message to this list: "ASCII strings are marked as ASCII
internally, but this information doesn't seem to be available, e.g.,
Encoding() returns "unknown" for such strings "
http://r.789695.n4.nabble.com/source-parse-and-foreign-UTF-8-characters-tp4733523.html
>
> Again, this is not the case with iconv()
>>
>> x_iutf8 <- iconv(x, to = "UTF-8")
>> Encoding(x_iutf8)
>> x_inat <- iconv(x_iutf8, from = "UTF-8")
>> Encoding(x_inat)
>>
>
> iconv is converting from/to the current locale's encoding, presumably
> CP1252, not from the marked encoding (as the help page states explicitly.)
>
I am aware that iconv is not using the marked encoding, but that you either
have to set it explicitly or it uses the current locale's default encoding.
As I said I am worried about the fact that the encoding markers get lost
with the enc2* functions or rather they are not set correctly. I am just
using the iconv example to show that iconv is able to set the encoding
markers correctly. So it seems generally possible.
> x_iutf8 <- iconv(c("€", "\x80"), to = "UTF-8")
> Encoding(x_iutf8)
[1] "UTF-8" "UTF-8"
> x_iutf8
[1] "€" "€"
> x_inat <- iconv(x_iutf8, from = "UTF-8")
> Encoding(x_inat)
[1] "latin1" "latin1"
> x_inat
[1] "\u0080" "\u0080"
[[alternative HTML version deleted]]
More information about the R-devel
mailing list