[R] Regular expressions: bug or misunderstanding?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Jul 6 23:27:50 CEST 2008


Try adding perl = TRUE

On Sun, Jul 6, 2008 at 5:17 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> I'm trying to write a gsub() call that takes a string and escapes all the
> unescaped quote marks in it.  So the string
>
> \"
>
> would be left unchanged, but
>
> \\"
>
> would be changed to
>
> \\\"
>
> because the double backslash doesn't act as an escape for the quote, the
> first just escapes the second.  I have the usual problems of writing regular
> expressions involving backslashes which make everything I write completely
> unreadable, so I'm going to change the problem for this post:  I will define
> E to be the escape character, and q to be the quote; the gsub() call would
> leave
>
> Eq
>
> unchanged, but would change
>
> EEq
>
> to EEEq, etc.
>
> The expression I have come up with after this change is
>
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", x)
>
> i.e. "(start of line, or non-escape, followed by an even number of escapes),
> all of which we call expression 1, followed by a quote, is replaced by
> expression 1 followed by an escape and a quote".
>
> This works sometimes, but not always:
>
>> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq")
> [1] "Eq"
>> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq")
> [1] "EEEq"
>> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq")
> [1] "EqaEq"
>> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq")
> [1] "qEq"
>
> Notice that in the final example, the first quote doesn't get escaped.  Why
> not????
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list