[R] Non-ACSII characters in R on Windows

Duncan Murdoch murdoch.duncan at gmail.com
Thu Sep 19 12:45:38 CEST 2013


On 13-09-19 5:06 AM, Maxim Linchits wrote:
> Have any of the thread participants sent a bug report to R? If not,
> let me know if you intend to so so. Otherwise, I'll send a report
> myself.

There's no bug, as far as I know.  The issue is that various functions 
(by design) convert strings to the local encoding, and in the example 
you were trying, the local encoding can't represent all the characters, 
so they are shown using the hex codes, and things get messed up.

I'm currently looking into changing the design, so that there is more 
use of UTF-8 internally.  This is likely to have side effects, which 
need to be investigated carefully.

Duncan Murdoch

>
> thanks
>
> On Tue, Sep 17, 2013 at 5:01 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> On 13-09-17 8:15 AM, Milan Bouchet-Valat wrote:
>>>
>>> Le lundi 16 septembre 2013 à 20:04 +0400, Maxim Linchits a écrit :
>>>>
>>>> Here is that old post:
>>>>
>>>> http://r.789695.n4.nabble.com/read-csv-and-FileEncoding-in-Windows-version-of-R-2-13-0-td3567177.html
>>>>
>>>> A taste: "Again, the issue is that opening this UTF-8 encoded file
>>>> under R 2.13.0 yields an error, but opening it under R 2.12.2 works
>>>> without any issues. (...)"
>>>
>>> I have tried with R 2.12.2 both 32 and 64 bit on Windows Server 2008
>>> with the French (CP1252) locale, and I still experience an error with
>>> the test case I provided in previous messages. So it does not sound like
>>> it is the same issue.
>>
>>
>>
>> I can reproduce the error with a file sent to me by Maxim.  From a quick
>> look, I suspect that changes will be needed to read.table to handle this,
>> and they'll be large enough that they won't make it into 3.0.2, but
>> hopefully will go into R-patched after the release.
>>
>> Duncan Murdoch



More information about the R-help mailing list