[R] sink() and UTF-8 on non-UTF-8 systems
Milan Bouchet-Valat
nalimilan at club.fr
Fri Apr 11 17:49:19 CEST 2014
Hi!
In the series "dealing with encoding madness on hostile systems", I'm
looking for help as regards capturing R UTF-8 output on a system where
the locale is not using UTF-8, and where some characters cannot even be
represented using the locale encoding. The case I have in mind is
printing a character vector with Russian text to the R Commander output
window on an English/French (CP1252) Windows system.
Here's a code snippet illustrating the problem:
> "\U41F"
[1] "П" # OK
> con <- file(open="w+", encoding="UTF-8")
> capture.output(cat("\U41F"), file=con)
> readLines(con, encoding="UTF-8")
[1] "<U+041F>" # Not OK
(same result without specifying 'encoding')
Now I have read ?sink and it is quite explicit about how this works:
> If file is a character string, the file will be opened using the
> current encoding. If you want a different encoding (e.g. to represent
> strings which have been stored in UTF-8), use a file connection — but
> some ways to produce R output will already have converted such strings
> to the current encoding.
The last words seem to apply to the case above, i.e. somewhere in the
process the UTF-8 string is converted to the locale encoding. Is there
any solution to get the correct output?
Thanks
> sessionInfo()
R Under development (unstable) (2014-04-10 r65396)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
More information about the R-help
mailing list