[R] Accented characters, windows
Duncan Murdoch
murdoch.duncan at gmail.com
Wed Mar 30 00:56:05 CEST 2016
On 29/03/2016 5:39 PM, Jan Kacaba wrote:
> I have problem with accented characters. My OS is Win 8.1 and I'm using
> RStudio.
>
> I make string :
> av="ěščřž"
>
> When I call "av" I get result bellow.
>> av
> [1] "ìšèøž"
>
> The resulting characters are different. I have similar problem when I write
> string to a file. In RGUI if I call "av" it prints characters correctly,
> but using "write" function to print string in a file results in the same
> problem.
>
> Can you please help me how to deal with it?
You don't say what code page you're using.
R in Windows has a long standing problem that it works mainly in the
local code page, rather than working in UTF-8 as most other systems do.
(This is due to the fact that when the internationalization was put
in, UTF-8 was exotic, rather than ubiquitous as it is now.) So R can
store UTF-8 strings on any system, but for display it converts them to
the local code page, and that conversion can lose information if the
characters aren't supported locally.
With your string, I don't see the same thing as you, I see
"ešcrž"
which is also incorrect, but looks a little closer, because it does a
better approximation in my code page.
So if you think my result is better than yours, you could change your
system to code page 437 as I'm using, but that will probably cause you
worse problems.
Probably the only short term solution that would be satisfactory is to
stop using Windows. At some point in the future the internal character
handling in R needs an overhaul, but that's a really big, really
thankless job. Perhaps Microsoft/Revolution will donate some programmer
time to do it, but more likely, it will wait for volunteers in R Core to
do it. I don't think it will happen in 2016.
Duncan Murdoch
More information about the R-help
mailing list