[R] regular expression question

John McKown john.archie.mckown at gmail.com
Wed Jan 14 17:47:43 CET 2015


On Wed, Jan 14, 2015 at 10:03 AM, MacQueen, Don <macqueen1 at llnl.gov> wrote:

> I know you already have a couple of solutions, but I would like to mention
> that it can be done in two steps with very simple regular expressions. I
> would have done:
>
> s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test",
>        'rhofixedtest','norhofixedtest')
> res <- gsub('norhofixed$', '',s)
> res <- gsub('rhofixed$', '',res)
> res
> [1] "lngimbint"      "lngimbnoint"    "test"
>     "rhofixedtest"   "norhofixedtest"
>
>
> (this is for those of us who don't understand regular expressions very
> well!)
>

​There is one possible problem with your solution.​ Consider the string:
arhofixednorhofixed. It ends with norhofixed and, according to the original
specification, needs to result in arhofixed. (I will admit this is a
contrived case which is very unlikely to occur in reality). But since you
do TWO regular expressions, first removing the trailing norhofixed,
resulting in "arhofixed" (the correct answer?), but then reduces that to
simply "a". The other regular expressions correctly remove either
norhofixed or rhofixed, if they are written _correctly_. That is, they
check first for norhofixed, with an alternate of rhofixed, or conditionally
match the no in front of the rhofixed at the very end of the string (my
example). To be even more explicit the regexp "nohrofixed|rhofixed" will
work properly but "rhofixed|norhofixed" will not because the "norhofixed"
won't be looked for if the "rhofixed" matches. Yes, regular expressions can
be complicated. Although I have a liking for them due to their
expressiveness and power, it is like an person using raw nitroglycerin
instead of dynamite. Dangerous.



>
> -Don
>
> --
> Don MacQueen
>
> Lawrence Livermore National Laboratory
>
-- 
​
While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity.  In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

	[[alternative HTML version deleted]]



More information about the R-help mailing list