[R] regular expression question

MacQueen, Don macqueen1 at llnl.gov
Wed Jan 14 18:06:00 CET 2015


Good point, John. Illustrates the danger of assuming there are no "perverse cases".

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062


From: John McKown <john.archie.mckown at gmail.com<mailto:john.archie.mckown at gmail.com>>
Date: Wednesday, January 14, 2015 at 8:47 AM
To: dh m <macqueen1 at llnl.gov<mailto:macqueen1 at llnl.gov>>
Cc: Mark Leeds <markleeds2 at gmail.com<mailto:markleeds2 at gmail.com>>, "r-help-stat.math.ethz.ch" <r-help at stat.math.ethz.ch<mailto:r-help at stat.math.ethz.ch>>
Subject: Re: [R] regular expression question

On Wed, Jan 14, 2015 at 10:03 AM, MacQueen, Don <macqueen1 at llnl.gov<mailto:macqueen1 at llnl.gov>> wrote:
I know you already have a couple of solutions, but I would like to mention
that it can be done in two steps with very simple regular expressions. I
would have done:

s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test",
       'rhofixedtest','norhofixedtest')
res <- gsub('norhofixed$', '',s)
res <- gsub('rhofixed$', '',res)
res
[1] "lngimbint"      "lngimbnoint"    "test"
    "rhofixedtest"   "norhofixedtest"


(this is for those of us who don't understand regular expressions very
well!)

​There is one possible problem with your solution.​ Consider the string: arhofixednorhofixed. It ends with norhofixed and, according to the original specification, needs to result in arhofixed. (I will admit this is a contrived case which is very unlikely to occur in reality). But since you do TWO regular expressions, first removing the trailing norhofixed, resulting in "arhofixed" (the correct answer?), but then reduces that to simply "a". The other regular expressions correctly remove either norhofixed or rhofixed, if they are written _correctly_. That is, they check first for norhofixed, with an alternate of rhofixed, or conditionally match the no in front of the rhofixed at the very end of the string (my example). To be even more explicit the regexp "nohrofixed|rhofixed" will work properly but "rhofixed|norhofixed" will not because the "norhofixed" won't be looked for if the "rhofixed" matches. Yes, regular expressions can be complicated. Although I have a liking for them due to their expressiveness and power, it is like an person using raw nitroglycerin instead of dynamite. Dangerous.



-Don

--
Don MacQueen

Lawrence Livermore National Laboratory
--
​
While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity.  In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

	[[alternative HTML version deleted]]



More information about the R-help mailing list