[R] write.csv covert Åland to <c5>land
Marc Schwartz
m@rc_@chw@rtz @end|ng |rom me@com
Tue Oct 20 17:03:45 CEST 2020
Hi,
One additional option that you might want to look at is to use ?writeLines with 'useBytes = TRUE', where the default is FALSE.
Windows, as Duncan notes, is problematic with extended encodings, and you can actually get conflicted encoding of text, based upon what is used within R, versus the local system encoding set by the OS.
There is an added step of complexity with writeLines(), of having to pre-format the line(s) to be output to conform to CSV required formatting. So you would need to paste() together each output line first using field delimiters, double quotes, etc. prior to output.
Essentially, mimic the default formatting of write.csv(), on a line by line basis, and then output the resulting object to a text file, with a single call to writeLines().
Regards,
Marc Schwartz
> On Oct 20, 2020, at 10:28 AM, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>
> You don't say, but I'd guess you're using Windows. In your code page, the character Å is probably not representable. At some point in the sequence of operations involved in printing the dataframe R puts the string into the native encoding, and since that's impossible on your system, it substitutes the <c5> instead. The fact that you can sometimes display it is because internally R uses UTF-8 as much as it can, and it can represent the character.
>
> One fix for this is to switch from Windows to some other OS. The others all have proper support for UTF-8.
>
> You might have luck changing your Windows code page to one that includes the Å, but then there'll be some other characters that are missed.
>
> You should definitely investigate Eberhard's advice, and test non-base packages like readr. They are all written much more recently than the base functions, and might have proper support for out-of-code-page characters.
>
> Duncan Murdoch
>
> On 20/10/2020 8:20 a.m., Jinsong Zhao wrote:
>> Hi there,
>> Why the same string is displayed in different form?
>> > abc[,1]
>> [1] "Åland" "Afghanistan"
>> > abc
>> name
>> 1 <c5>land
>> 2 Afghanistan
>> And more...
>> > dput(abc, "aa.txt")
>> > dget("aa.txt")
>> name
>> 1 <c5>land
>> 2 Afghanistan
>> > dget("aa.txt")[,1]
>> [1] "<c5>land" "Afghanistan"
>> Best,
>> Jinsong
>> On 2020/10/20 17:13, Jinsong Zhao wrote:
>>> Hi there,
>>>
>>> I tried to export the names of country to a csv file with write.csv().
>>> In the resulted file, Åland was coverted to <c5>land. Is there any way
>>> could prevent this happening? Thanks!
>>>
>>> > abc
>>> [1] "Åland"
>>> > write.table(abc, file = "")
>>> "x"
>>> "1" "<c5>land"
>>>
>>> Best,
>>> Jinsong
More information about the R-help
mailing list