[R] Regexp pattern but fixed replacement?
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Thu Apr 11 19:08:41 CEST 2024
On 11/04/2024 12:57 p.m., Dave Dixon wrote:
> Backslashes in regex expressions in R are maddening, but they make sense.
>
> R string handling interprets your replacement string "\\" as just one
> backslash. Your string is received by gsub as "\" - that is, just the
> control backslash, NOT the character backslash. gsub is expecting to see
> \0, \1, \2, or some other control starting with backslash.
>
> If you want gsub to replace with a backslash character, you have to send
> it as "\\". In order to get two backslash characters in an R string, you
> have to double them ALL: "\\\\".
You can use "\\" if the pattern is declared as "fixed", via
sub("a", "\\", "abcdef", fixed = TRUE)
or
stringr::str_replace("abcdef", fixed("a"), "\\")
My first question was whether there is a sub-like function with a way to
declare the pattern as a regexp, but the replacement as fixed. Thanks
for your answer to my second question.
Duncan Murdoch
>
> The string that is output is an R string: the backslashes are escaped
> with a backslash, so "\\\\" really means two backslashes.
>
> There are lots of special characters in the search string, but only one
> in the replacement string: backslash.
>
> Here's my favorite resource on this topic is
> https://www.regular-expressions.info/replacecharacters.html
>
>
> On 4/11/24 10:35, Duncan Murdoch wrote:
>> I noticed this issue in stringr::str_replace, but it also affects
>> sub() in base R.
>>
>> If the pattern in a call to one of these needs to be a regular
>> expression, then backslashes in the replacement text are treated
>> specially.
>>
>> For example,
>>
>> gsub("a|b", "\\", "abcdef")
>>
>> gives "def", not "\\\\def" as I wanted. To get the latter, I need to
>> escape the replacement backslashes, e.g.
>>
>> gsub("a|b", "\\\\", "abcdef")
>>
>> which gives "\\\\cdef".
>>
>> I have two questions:
>>
>> 1. Is there a variant on sub or str_replace which allows the pattern
>> to be declared as a regular expression, but the replacement to be
>> declared as fixed?
>>
>> 2. To get what I want, I can double the backslashes in the
>> replacement text. This would do that:
>>
>> replacement <- gsub("\\\\", "\\\\\\\\", replacement)
>>
>> Are there any other special characters to worry about besides
>> backslashes?
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list