[R] gsub issue in R 2.11.1, but not present in 2.9.2

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Tue Jun 29 20:55:46 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Bert Gunter
> Sent: Tuesday, June 29, 2010 11:08 AM
> To: 'Jason Rupert'; 'Duncan Murdoch'
> Cc: r-help at r-project.org
> Subject: Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2
> 
> Jason:
> 
> I think it's actually even a bit worse than what Duncan said, which
> was:
> 
> -----------
> "You need to double the backslashes to enter them in an R string.  So
> 
> gsub("N\\A", "NA", original, fixed=TRUE)
> 
> should work if original contains a single backslash, and
> 
> gsub("N\\\\A", "NA", original, fixed=TRUE)
> 
> should work if it contains a double one.  Two things add to the
> confusion
> here:  First, a single backslash will be displayed doubled by print().
> .. "
> ------
> 
> Well, let's see: (On R version 2.11.1, 2010-5-31 for Windows)
> 
> > astring <- "n\a"
> > print(astring)
> [1] "n\a"
> 
> So Duncan's last sentence appears to be incorrect. The "\" is not
> displayed
> doubled. However ...

But Duncan's statement is correct.   In your example above, there is no backslash character in the variable astring.  It contains the letter 'n' and the control character '\a', which is a single character (the backslash is printed by print() to indicated the control character).  If there was actually a backslash character in the string, print() would have doubled.
  

> 
> > bstring <- "N\A"
> Error: '\A' is an unrecognized escape in character string starting
> "N\A"
> 
> What's going on? Well, the "\a" in astring is a _single escape sequence
> (for
> a beep/bell sound, on Windows anyway: cat("\a") should make a sound).
> So the
> "\" in "\a" is printed as correctly undoubled. However, since the "\A"
> in
> bstring does _not_ correspond to any escape sequence, the expression
> "\A"
> cannot be parsed and an error is thrown. But:
> 
> > bstring <- "N\\A"
> > print(bstring)
> [1] "N\\A"   ## is fine
> 
> ## ... Noting that
> 
> > nchar("\\A")
> [1] 2
> 
> So whether a "\" needs to be doubled or not depends on whether the
> parser
> can interpret it as part of a legitimate escape sequence, whence
> 
> gsub("\a","","\a") ## works but
> gsub("\A","","\A") ## does not.

Whether "\" needs to be doubled depends on what you want the string value to be.  If you want the single control character, '\a', then you don't want to double it.  If you want the string to contain 2 characters '\' and 'a', then you must enter '\\a'.

> 
> To avoid such confusion, I think Duncan's advice to double backslashes
> should be heeded as much as possible. Unfortunately, I don't think it's
> always possible:

In this case, if you actually want a newline character, then you don't want to use a double backslash.

> 
> > newlineString <- "first line\nsecond line\n"
> > print(newlineString)
> [1] "first line\nsecond line\n"
> > cat(newlineString)
> first line
> second line
> 
> Cheers,

Hope this is helpful,

Dan


Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204




More information about the R-help mailing list