[R] Obtain the hex code for a given character.

Jim Lemon jim at bitwrit.com.au
Wed Feb 5 03:17:18 CET 2014


On 02/05/2014 01:01 PM, Duncan Murdoch wrote:
> On 14-02-04 7:57 PM, Rolf Turner wrote:
>>
>>
>> If I have a character such as "£" stored in a object called "xxx", how
>> can I obtain the hex code representation of this character? In this
>> case I know that the hex code is "\u00A3", but if I didn't, how would I
>> find out?
>
> charToRaw will give you the bytes used to store it:
>
>  > charToRaw("£")
> [1] c2 a3
>
> That was on MacOS, which uses UTF-8 encoding. On Windows, using Latin1,
>
>  > charToRaw("£")
> [1] a3
>
> You won't see 00A3, because that's not an encoding that R uses, that's
> the Unicode "code point". It's not too hard to get to that from the
> UTF-8 encoding, but I don't know any R function that does it.
>
>>
>> I would like a function "foo()" such that foo(xxx) would return, say,
>> the string "00A3".
>
> I don't know how to get that string, but as.character(charToRaw(x)) will
> put the bytes for x in strings, e.g.
>
> as.character(charToRaw("£"))
>
> gives
>
> [1] "c2" "a3"
>
> on a Mac.
>
>>
>> I have googled and otherwise searched around and have come up with
>> nothing that seemed at all helpful to me. If I am missing something
>> obvious, please point me at it.
>>
>> (I have found a table on the web, which contains the information that I
>> need, but it is only accessible "by eye" as far as I can discern.)
>>
>> Supplementary question: Suppose I have the string "00A3" stored in
>> an object called "yyy". How do I put that string together with "\u"
>> so as to obtain "£"? I thought I could do
>>
>> xxx <- paste("\u",yyy,sep="")
>>
>> but R won't let me use "\u" "without hex digits". How can I get around
>> this?
>
> The \u notation with a code point is handled by the R parser, so you
> need to parse that string, which means putting it in quotes first, e.g.
>
> xxx <- eval(parse(text = paste0("'\\u", yyy, "'")))
>
> That seems pretty excessive. You'd probably be better off doing all of
> this in C instead...
>
Hi Rolf,
I almost got it in Linux with:

x<-\u00A3
paste("\\u",
  toupper(paste(as.character(charToRaw(x)),sep="",collapse="")),
  sep="",collapse="")
[1] "\\uC2A3"

But I couldn't get rid of the double backslash, so I must agree with 
Duncan. Also, I don't know how the "C2" gets in there.

Jim




More information about the R-help mailing list