[Rd] special latin1 do not print as glyphs in current devel on windows
Daniel Possenriede
possenriede at gmail.com
Tue Aug 1 11:19:51 CEST 2017
Upon further inspection, I think these are at least two problems.
First the issue with printing latin1/cp1252 characters in the "80" to "9F"
code range.
x <- c("€", "–", "‰")
Encoding(x)
print(x)
I assume that these are Unicode escapes!? (Given that Encoding(x) shows
"latin1" I'd rather expect latin1/cp1252 escapes here, but these would be
e.g. "\x80", right? My locale is LC_COLLATE=German_Germany.1252 btw.)
Now I don't know why print tries to convert to Unicode, but if these indeed
are Unicode escapes, then there is something wrong with the conversion from
cp1252 to Unicode.
In general, most cp1252 char codes translate to Unicode like CP1252: "00"
-> Unicode "0000", "01" -> "0001", "02" -> "0002", etc. see
http://www.cp1252.com/.
The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is
"80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in
Unicode.
The same error seems to happen with
enc2utf8(x)
Now with iconv() the result is as expected.
iconv(x, to = "UTF-8")
The second problem IMO is that encoding markers get lost with the enc2*
functions
x_utf8 <- enc2utf8(x)
Encoding(x_utf8)
x_nat <- enc2native(x_utf8)
Encoding(x_nat)
Again, this is not the case with iconv()
x_iutf8 <- iconv(x, to = "UTF-8")
Encoding(x_iutf8)
x_inat <- iconv(x_iutf8, from = "UTF-8")
Encoding(x_inat)
[[alternative HTML version deleted]]
More information about the R-devel
mailing list