[R] gsub: replacing double backslashes with single backslash

Greg Snow 538280 at gmail.com
Wed Mar 7 18:57:51 CET 2012


The issue here is the difference between what is contained in a string
and what R displays to you.

The string produced with the code:

> tmp <- "C:\\"

only has 3 characters (as David pointed out), the third of which is a
single backslash, since the 1st \ escapes the 2nd and the R string
parsing rules use the combination to put a sing backslash in the
string.  When you print the string (whether you call print directly or
indirectly) the print function escapes special characters, including
the backslash, so you see "\\" which represents a single backslash in
the string.  If you use the cat function instead of the print
function, then you will only see a single backslash (and other escape
sequences such as \n will also display different in print vs. cat
output).  There are other ways to see the exact string (write to a
file, use in certain command, etc.) but cat is probably the simplest.

On Wed, Mar 7, 2012 at 7:57 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Mar 7, 2012, at 6:54 AM, Markus Elze wrote:
>
>> Hello everybody,
>> this might be a trivial question, but I have been unable to find this
>> using Google. I am trying to replace double backslashes with single
>> backslashes using gsub.
>
>
> Actually you don't have double backslashes in the argument you are
> presenting to gsub. The string entered at the console as "C:\\" only has a
> single backslash.
>
>> nchar("C:\\")
> [1] 3
>
>
>> There seems to be some unexpected behaviour with regards to the
>> replacement string "\\". The following example uses the string C:\\ which
>> should be converted to C:\ .
>>
>> > gsub("\\\\", "\\", "C:\\")
>> [1] "C:"
>
>
> But I do not understand that returned value, either. I thought that the
> 'repl' argument (which I think I have demonstrated is a single backslash)
> would get put back in the returned value.
>
>
>
>> > gsub("\\\\", "Test", "C:\\")
>> [1] "C:Test"
>> > gsub("\\\\", "\\\\", "C:\\")
>> [1] "C:\\"
>
>
> I thought the parsing rules for 'replacement' were different than the rules
> for 'patt'. So I'm puzzled, too. Maybe something changed in 2.14?
>
>> sub("\\\\", "\\", "C:\\", fixed=TRUE)
> [1] "C:\\"
>
>> sub("\\\\", "\\", "C:\\")
> [1] "C:"
>> sub("([\\])", "\\1", "C:\\")
> [1] "C:\\"
>
> The NEWS file does say that there is a new regular expression implementation
> and that the help file for regex should be consulted.
>
> And presumably we should study this:
>
> http://laurikari.net/tre/documentation/regex-syntax/
>
>  In the 'replacement' argument, the "\\" is used to back-reference a
> numbered sub-pattern, so perhaps "\\" is now getting handled as the "null
> subpattern"? I don't see that mentioned in the regex help page, but it is a
> big "page". I also didn't see "\\" referenced in the TRE documentation, but
> then again I don't think that "\\" in console or source() input is a double
> backslash. The TRE document says that "A \ cannot be the last character of
> an ERE." I cannot tell whether that rule gets applied to the 'replacement'.
>
>
>>
>>
>> I have observed similar behaviour for fixed=TRUE and perl=TRUE. I use R
>> 2.14.1 64-bit on Windows 7.
>
>
>
> --
> David Winsemius, MD
> West Hartford, CT
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com



More information about the R-help mailing list