[R] gsub() with unicode and escape character

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Jul 17 15:30:01 CEST 2011



On 17.07.2011 15:18, Nipesh Bajaj wrote:
> I really sorry if I understood your statement correctly :(
>
> You said:
> " To put a backslash in the replacement expression of sub or gsub
> (when fixed=FALSE) use 4 backslashes"
>
> I understood it is okay if I want to replace something with 2
> backslashes. what if I want to replace that with just 1 backslash? I
> have tried following however didn't work (R is asking few more input):
>
> gsub("d","\\\",my.data$animals)
>
> You said:
> "replacement expression backslash-digit means to use the digit'th
> parenthesized subpattern as the replacement"
>
> Would you please elaborate this phenomena?  If I use "backslash-digit
> = 6" then I dont see any difference in the end result:
>> gsub("d","\\\\\\",my.data$animals)
> [1] "\\og" "wolf" "cat"
>
> Really helpful if you elaborate more on these issues.


Yes, because that translates (after R's processing) to "\\\" and end up 
after the real replacement in the string "\\\og"

If you interpret that it means 1 backslash (coming from the first two), 
an (escaped) "o" which is the same as a regular "o" and finally that "g".

Uwe Ligges



> Thanks,
>
> On Sun, Jul 17, 2011 at 8:34 AM, William Dunlap<wdunlap at tibco.com>  wrote:
>> To put a backslash in the replacement expression
>> of sub or gsub (when fixed=FALSE) use 4 backslashes.
>> The rationale is that the replacement expression
>> backslash-digit means to use the digit'th parenthesized
>> subpattern as the replacement and backslash-backslash means
>> to put in a literal backslash.  However, R parser also uses
>> backslashes to signify things like unicode characters (that
>> backslash is not in the string stored by R, but is just a
>> signal to the parser) and it requires a doubled backslash
>> to enter a backslash.  2*2 is 4 backslashes.  E.g.,
>>
>>   >  gsub("([[:digit:]]+)([[:alpha:]]+)", "alpha=<<\\2>>\\\\numeric=<<\\1>>", c("12P", "34Cat"))
>>   [1] "alpha=<<P>>\\numeric=<<12>>"   "alpha=<<Cat>>\\numeric=<<34>>"
>>   >  cat(.Last.value, sep="\n") # see what is really in the strings
>>   alpha=<<P>>\numeric=<<12>>
>>   alpha=<<Cat>>\numeric=<<34>>
>>
>> I don't know about your unicode/encoding problem.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Sverre Stausland
>>> Sent: Saturday, July 16, 2011 7:20 PM
>>> To: r-help at r-project.org
>>> Subject: [R] gsub() with unicode and escape character
>>>
>>> Dear helpers,
>>>
>>> I'm trying to replace a character with a unicode code inside a data
>>> frame using gsub(), but unsuccessfully.
>>>
>>>> data.frame(animals=c("dog","wolf","cat"))->my.data
>>>> gsub("o","\u0254",my.data$animals)->my.data$animals
>>>> my.data$animals
>>> [1] "dɔg"  "wɔlf" "cat"
>>>
>>> It's not that a data frame cannot have unicode codes, cf. e.g.
>>>
>>>> data.frame(animals=c("d\u0254g","w\u0254lf","cat"))->my.data.2
>>>> my.data.2$animals
>>> [1] dɔg  wɔlf cat
>>> Levels: cat d<U+0254>g w<U+0254>lf
>>>
>>> I've done the best I can based on what ?gsub and ?enc2utf8 tell me,
>>> but I haven't found a solution.
>>>
>>> Unrelated to that problem, but related to gsub() is that I can't find
>>> a way for gsub() to interpret the backslash as a character. In regular
>>> expression, \\ should represent "the character \", but gsub() doesn't:
>>>
>>>> data.frame(animals=c("dog","wolf","cat"))->my.data
>>>> gsub("d","\\",my.data$animals)
>>> [1] "og"   "wolf" "cat"
>>>
>>> Thank you
>>> Sverre
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list