[R] grep and gsub on backslash and quotes

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Tue Aug 12 18:21:40 CEST 2003


"Simon Fear" <Simon.Fear at synequanon.com> writes:

> The following code works,  to gsub single quotes to double quotes:
> 
> line <- gsub("'", '"', line)
> 
> (that's a single quote within doubles then a double within singles if
> your
> viewer's font is not good).
> 
> But The R Language Manual tells me that
> 
> Quotes and other special characters within strings
> are specified using escape sequences:
> \' single quote
> \" double quote
> 
> so why is the following wrong: gsub("\\\\'", "\\\\"", line)? That or any
> other number of backslashes (have tried all up to n=6 just for good
> measure).

There's a backslash missing in the replacement. This works:

line <- "ab\\\'cd"
gsub("\\\\'", "\\\\\"", line)

and will replace \' with  \"
 
> BTW is it documented anywhere that you need four backslashes in an RE to
> match one in the target, when it is being passed as an argument to gsub
> or
> grep? How would I know how many levels of doubling up to use for any
> other
> functions? (I got to 4 consecutive \ by trial and error in this case,
> but
> have a dim memory of having read about it somewhere.)

There are two levels because backslashes are escape characters both to
R strings and regular expressions. So in the above, "line" is 

ab\'cd

and the match pattern is 

\\' which matches \' 

and the replacement is

\\" which becomes \"


More interesting is

> gsub("\\'", "a", line)
[1] "ab\\'cda"
> gsub("\\'", "a", line, perl=T)
[1] "ab\\acd"

so \' matches a single quote with PCRE but not with ordinary RE. (Yes,
there's a reason...)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list