[R] gsub: replacing double backslashes with single backslash
Daniel Nordlund
djnordlund at frontier.com
Thu Mar 8 07:08:06 CET 2012
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Ista Zahn
> Sent: Wednesday, March 07, 2012 6:55 PM
> To: Greg Snow
> Cc: r-help at r-project.org; Markus Elze
> Subject: Re: [R] gsub: replacing double backslashes with single backslash
>
> On Wed, Mar 7, 2012 at 12:57 PM, Greg Snow <538280 at gmail.com> wrote:
> >
> > The issue here is the difference between what is contained in a string
> > and what R displays to you.
> >
> > The string produced with the code:
> >
> > > tmp <- "C:\\"
> >
> > only has 3 characters (as David pointed out), the third of which is a
> > single backslash, since the 1st \ escapes the 2nd and the R string
> > parsing rules use the combination to put a sing backslash in the
> > string. When you print the string (whether you call print directly or
> > indirectly) the print function escapes special characters, including
> > the backslash, so you see "\\" which represents a single backslash in
> > the string. If you use the cat function instead of the print
> > function, then you will only see a single backslash (and other escape
> > sequences such as \n will also display different in print vs. cat
> > output). There are other ways to see the exact string (write to a
> > file, use in certain command, etc.) but cat is probably the simplest.
>
>
> Fine, but how does this help the OP (and me!) figure out how to
> replace "C:\\" with "C:\" ?
>
> Best,
> Ista
Ista,
you have received some good descriptions / explanations of what is going on, but you don't seem to have digested it yet. I don't blame you, I found this difficult myself when I first encountered this. One needs to keep distinct what is actually contained in a string, and how R chooses to display it under various circumstances. Consider the example again
>tmp <- "C:\\"
the variable tmp contains only three characters: 1. a capital C, 2. a colon, and 3. a single backslash. You can tell it only has three characters like this
> nchar(tmp)
[1] 3
If you use cat() to display the contents you will also see that it only has three characters (I included the newline character to force a newline; print() does it automatically, but cat() doesn't)
> cat(tmp, '\n')
C:\
So again we see just three characters. However, if we display the variable with print, we will see two backslashes even though there is actually only one backslash in the variable.
> print(tmp)
[1] "C:\\"
So when you ask, 'Fine, but how does this help the OP (and me!) figure out how to replace "C:\\" with "C:\"?', you need to be clear about whether you are talking about a string which displays with two backslashes, or a string that actually has two consecutive backslashes, which print() will display as four consecutive backslashes. If you are talking about a variable, tmp, that actually has two backslashes in it, then it will display like this
> tmp
[1] "C:\\\\"
> print(tmp)
[1] "C:\\\\"
> cat(tmp,'\n')
C:\\
If you then want to change it so that it has only 1 backslash in it, you could do
> tmp <- sub('\\\\', '\\', tmp)
> tmp
[1] "C:\\"
> print(tmp)
[1] "C:\\"
> cat(tmp,'\n')
C:\
Hope this is helpful,
Dan
Daniel Nordlund
Bothell, WA USA
More information about the R-help
mailing list