[R] Regular expressions: bug or misunderstanding?
Duncan Murdoch
murdoch at stats.uwo.ca
Sun Jul 6 23:17:04 CEST 2008
I'm trying to write a gsub() call that takes a string and escapes all
the unescaped quote marks in it. So the string
\"
would be left unchanged, but
\\"
would be changed to
\\\"
because the double backslash doesn't act as an escape for the quote, the
first just escapes the second. I have the usual problems of writing
regular expressions involving backslashes which make everything I write
completely unreadable, so I'm going to change the problem for this
post: I will define E to be the escape character, and q to be the
quote; the gsub() call would leave
Eq
unchanged, but would change
EEq
to EEEq, etc.
The expression I have come up with after this change is
gsub( "((^|[^E])(EE)*)q", "\\1Eq", x)
i.e. "(start of line, or non-escape, followed by an even number of
escapes), all of which we call expression 1, followed by a quote, is
replaced by expression 1 followed by an escape and a quote".
This works sometimes, but not always:
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq")
[1] "Eq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq")
[1] "EEEq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq")
[1] "EqaEq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq")
[1] "qEq"
Notice that in the final example, the first quote doesn't get escaped.
Why not????
Duncan Murdoch
More information about the R-help
mailing list