[R] Export Unicode characters from R

Duncan Murdoch murdoch.duncan at gmail.com
Fri Jul 15 20:38:58 CEST 2011


On 15/07/2011 1:42 PM, Sverre Stausland wrote:
> >>>
> >>>  >    funny.g<- "\u1E21"
> >>>  >    funny.g
> >>
> >>  [1] "ḡ"
> >>
> >>>  >    data.frame (funny.g) ->    funny.g
> >>>  >    funny.g$funny.g
> >>
> >>  [1] ḡ
> >>  Levels:<U+1E21>
> >
> >  I think the problem is in the data.frame code, not in writing. Data.frames
> >  try to display things in a readable way, and since you're on Windows where
> >  UTF-8 is not really supported, the code helpfully changes that character to
> >  the "<U+1E21>" string. for display.
>
> I thought the data.frame function didn't alter the unicode coding,
> since funny.g$funny.g above still displays the right unicode character
> (although it does list the levels as<U+1E21>).
>
> >  You should be able to write the Unicode character to file if you use lower
> >  level methods such as cat(), on a connection opened using the file()
> >  function with the encoding set explicitly.
>
> I'm sorry, but I don't understand what it means "to use cat() on a
> connection opened using the file() function". Could you please clarify
> that?
>

I just checked on how R does it.  We use UTF-8 encodings in the help 
pages, regardless of what kind of system you're running on.

It converts the strings to UTF-8 internally first (your funny.g is 
already encoded that way; see Encoding(funny.g)) then uses

writeLines( ..., useBytes=TRUE)

to write it.  The useBytes argument says not to try to make the file 
readable on the local system, just write out the bytes.

Another way to do it is to get your strings in the UTF-8 encoding, 
convert them to raw vectors, and use writeBin() to write those out.  For 
example,

funny.g<- "\u1E21"
rawstuff<- charToRaw(funny.g)
writeBin(rawstuff, "funny.g.txt")


All of this appears hard, because you're thinking of UTF-8 as text, but 
on Windows, R thinks of it as a binary encoding.  Modern Windows systems 
can handle UTF-8, but not all programs on them can.

Duncan Murdoch



More information about the R-help mailing list