[R] gsub: replacing double backslashes with single backslash
David Winsemius
dwinsemius at comcast.net
Wed Mar 7 15:57:02 CET 2012
On Mar 7, 2012, at 6:54 AM, Markus Elze wrote:
> Hello everybody,
> this might be a trivial question, but I have been unable to find
> this using Google. I am trying to replace double backslashes with
> single backslashes using gsub.
Actually you don't have double backslashes in the argument you are
presenting to gsub. The string entered at the console as "C:\\" only
has a single backslash.
> nchar("C:\\")
[1] 3
> There seems to be some unexpected behaviour with regards to the
> replacement string "\\". The following example uses the string C:\\
> which should be converted to C:\ .
>
> > gsub("\\\\", "\\", "C:\\")
> [1] "C:"
But I do not understand that returned value, either. I thought that
the 'repl' argument (which I think I have demonstrated is a single
backslash) would get put back in the returned value.
> > gsub("\\\\", "Test", "C:\\")
> [1] "C:Test"
> > gsub("\\\\", "\\\\", "C:\\")
> [1] "C:\\"
I thought the parsing rules for 'replacement' were different than the
rules for 'patt'. So I'm puzzled, too. Maybe something changed in 2.14?
> sub("\\\\", "\\", "C:\\", fixed=TRUE)
[1] "C:\\"
> sub("\\\\", "\\", "C:\\")
[1] "C:"
> sub("([\\])", "\\1", "C:\\")
[1] "C:\\"
The NEWS file does say that there is a new regular expression
implementation and that the help file for regex should be consulted.
And presumably we should study this:
http://laurikari.net/tre/documentation/regex-syntax/
In the 'replacement' argument, the "\\" is used to back-reference a
numbered sub-pattern, so perhaps "\\" is now getting handled as the
"null subpattern"? I don't see that mentioned in the regex help page,
but it is a big "page". I also didn't see "\\" referenced in the TRE
documentation, but then again I don't think that "\\" in console or
source() input is a double backslash. The TRE document says that "A \
cannot be the last character of an ERE." I cannot tell whether that
rule gets applied to the 'replacement'.
>
>
> I have observed similar behaviour for fixed=TRUE and perl=TRUE. I
> use R 2.14.1 64-bit on Windows 7.
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list