[R] Help with complicated regular expression
Marc Schwartz
marc_schwartz at me.com
Fri Nov 13 15:33:41 CET 2009
On Nov 13, 2009, at 8:12 AM, Dennis Fisher wrote:
> Colleagues,
>
> I am using R (2.9.2, all platforms) to search for a complicated text
> string using regular expressions. I would appreciate any help you
> can provide.
> The string consists of the following elements:
> SOMEWORDWITHNOSPACES
> any number of spaces and/or tabs
> (
> any number of spaces and/or tabs
> integer
> any number of spaces and/or tabs
> )
>
> Examples include:
> WORD ( 123 )
> WORD(1 )
> WORD\t ( 21\t)
> WORD \t ( 1 \t )
> etc.
>
> I don't need to substitute anything, only to identify if such a
> string exists.
> Any help with regular expressions would be appreciated.
> Thanks.
>
> Dennis
How about this:
Lines <- c("WORD ( 123 )","WORD(1)", "WORD\t ( 21\t) ", "WORD\t
( 21\t) " )
> Lines
[1] "WORD ( 123 )" "WORD(1)" "WORD\t ( 21\t) "
[4] "WORD\t ( 21\t) "
> grep("^[A-Za-z]+.*\\(.*[0-9]+.*\\)", Lines)
[1] 1 2 3 4
You should test it on some real data to see if it works or needs to be
tweaked further.
^[A-Za-z]+ finds one or more characters at the beginning of the line
.* finds zero or more characters after the word
\\( finds an open paren
.* finds zero or more characters after the open paren
[0-9]+ finds one or more digits
.* finds zero or more characters after the digits
\\) finds the close paren
HTH,
Marc Schwartz
More information about the R-help
mailing list