[R] regular expression question

Loris Bennett loris.bennett at fu-berlin.de
Tue Jan 13 08:59:05 CET 2015


Hi Mark,

Mark Leeds <markleeds2 at gmail.com> writes:

> Hi All: I have a regular expression problem. If a character string ends
> with "rhofixed" or "norhofixed", I want that part of the string to be
> removed. If it doesn't end with either of those two endings, then the
> result should be the same as the original. Below doesn't work for the
> second case. I know why but not how to fix it. I lookrd st friedl's book
> and I bet it's in there somewhere but I can't find it. Thanks.
>
> s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test")
>
> result <- sub("^(.*)([n.*|r.*].*)$","\\1",s)
>
>  print(result)
> [1] "lngimbint"     "lngimbnointno" "test"
>
> 	[[alternative HTML version deleted]]
>

The matching of the initial .* is by default greedy, so it will match
everything before the last 'n' or 'r'.  As you always have an 'r' in
'rho', your 'no' gets eaten by the first pattern.  You can make a
pattern non-greedy by appending '?' to the quantifier.

I would do

> s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test")
> result <- sub("^(.*?)((no)?rhofixed)$","\\1",s)
> result
[1] "lngimbint"   "lngimbnoint" "test"

Cheers,

Loris

-- 
This signature is currently under construction.



More information about the R-help mailing list