[R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)

David Winsemius dwinsemius at comcast.net
Thu May 29 05:39:27 CEST 2014


On May 28, 2014, at 7:25 PM, Thomas Stewart wrote:

> Can anyone help me understand the following behavior?
> 
> I want to replace the letter 'X' in
> ​the string ​
> 'text X' with '≥' (\u226
> ​5
> ).  The output from gsub is not what I expect.  It gives: "text ≥".
> 
> Now, suppose I want to replace the character '≤' in
> ​ the string​
> 'text ≤' with '≥'.  Then, gsub gives the expected, desired output.
> 
> ​What am I missing?
> 
> Thanks for any insight.
> -tgs
> 
> Minimal Working Example:
> 
> string1 <- "text X"; string1
> new_string1 <- gsub("X","\u2265",string1); new_string1

Try this instead:

> new_string1 <- gsub("X","\\\u2265",string1); new_string1
[1] "text ≥"

Each "\" needs to be escaped, both the "\" in \u2265 as well as the "\" that escapes it.

> nchar("\\")
[1] 1
> nchar("\\\u2265")
[1] 2

You would be well-served by spending effort at reading:

?Quotes

-- 
David.
> 
> string2 <- "text \u2264"; string2
> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> 
> charToRaw(new_string1)
> charToRaw(new_string2)
> 
> sessionInfo()
> 
> ## OUTPUT
> 
>> string1 <- "text X"; string1
> [1] "text X"
> 
>> new_string1 <- gsub("X","\u2265",string1); new_string1
> [1] "text ≥"
> 
>> string2 <- "text \u2264"; string2
> [1] "text ≤"
> 
>> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> [1] "text ≥"
> 
>> charToRaw(new_string1)
> [1] 74 65 78 74 20 e2 89 a5


> charToRaw("\\\u2265")
[1] 5c e2 89 a5



> 
>> charToRaw(new_string2)
> [1] 74 65 78 74 20 e2 89 a5
> 
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 

It was a good idea to post sessionInfo(), but it would have been even better to have posted in plain text.


> 	[[alternative HTML version deleted]]
> 
-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list