[R] Export Unicode characters from R

Sverre Stausland johnsen at fas.harvard.edu
Fri Jul 15 23:36:18 CEST 2011


Hi,

I'm interested in the suggestion to use writeLines( ...,
useBytes=TRUE), but how can I use this function on the way to
exporting from R? Could you please provide a simple example?

The following suggestion worked very well:

> funny.g<- "\u1E21"
> rawstuff<- charToRaw(funny.g)
> writeBin(rawstuff, "funny.g.txt")

But the function charToRaw() only allows an object with a single
character, and writeBin cannot be used to export data frames. Is there
any solution along these lines when I have a data frame with Unicode
characters?

Best
Sverre

On Fri, Jul 15, 2011 at 2:38 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 15/07/2011 1:42 PM, Sverre Stausland wrote:
>>
>> >>>
>> >>>  >    funny.g<- "\u1E21"
>> >>>  >    funny.g
>> >>
>> >>  [1] "ḡ"
>> >>
>> >>>  >    data.frame (funny.g) ->    funny.g
>> >>>  >    funny.g$funny.g
>> >>
>> >>  [1] ḡ
>> >>  Levels:<U+1E21>
>> >
>> >  I think the problem is in the data.frame code, not in writing.
>> > Data.frames
>> >  try to display things in a readable way, and since you're on Windows
>> > where
>> >  UTF-8 is not really supported, the code helpfully changes that
>> > character to
>> >  the "<U+1E21>" string. for display.
>>
>> I thought the data.frame function didn't alter the unicode coding,
>> since funny.g$funny.g above still displays the right unicode character
>> (although it does list the levels as<U+1E21>).
>>
>> >  You should be able to write the Unicode character to file if you use
>> > lower
>> >  level methods such as cat(), on a connection opened using the file()
>> >  function with the encoding set explicitly.
>>
>> I'm sorry, but I don't understand what it means "to use cat() on a
>> connection opened using the file() function". Could you please clarify
>> that?
>>
>
> I just checked on how R does it.  We use UTF-8 encodings in the help pages,
> regardless of what kind of system you're running on.
>
> It converts the strings to UTF-8 internally first (your funny.g is already
> encoded that way; see Encoding(funny.g)) then uses
>
> writeLines( ..., useBytes=TRUE)
>
> to write it.  The useBytes argument says not to try to make the file
> readable on the local system, just write out the bytes.
>
> Another way to do it is to get your strings in the UTF-8 encoding, convert
> them to raw vectors, and use writeBin() to write those out.  For example,
>
> funny.g<- "\u1E21"
> rawstuff<- charToRaw(funny.g)
> writeBin(rawstuff, "funny.g.txt")
>
>
> All of this appears hard, because you're thinking of UTF-8 as text, but on
> Windows, R thinks of it as a binary encoding.  Modern Windows systems can
> handle UTF-8, but not all programs on them can.
>
> Duncan Murdoch
>
>



More information about the R-help mailing list