[R] Help with complicated regular expression

Marc Schwartz marc_schwartz at me.com
Fri Nov 13 15:33:41 CET 2009


On Nov 13, 2009, at 8:12 AM, Dennis Fisher wrote:

> Colleagues,
>
> I am using R (2.9.2, all platforms) to search for a complicated text  
> string using regular expressions.  I would appreciate any help you  
> can provide.
> The string consists of the following elements:
> 	SOMEWORDWITHNOSPACES
> 	any number of spaces and/or tabs
> 	(
> 	any number of spaces and/or tabs
> 	integer
> 	any number of spaces and/or tabs
> 	)
>
> Examples include:
> 	WORD (  123    )
> 	WORD(1 )
> 	WORD\t ( 21\t)
> 	WORD \t ( 1 \t   )
> etc.
>
> I don't need to substitute anything, only to identify if such a  
> string exists.
> Any help with regular expressions would be appreciated.
> Thanks.
>
> Dennis


How about this:

Lines <- c("WORD (  123    )","WORD(1)", "WORD\t ( 21\t) ", "WORD\t  
( 21\t) " )

 > Lines
[1] "WORD (  123    )" "WORD(1)"          "WORD\t ( 21\t) "
[4] "WORD\t ( 21\t) "

 > grep("^[A-Za-z]+.*\\(.*[0-9]+.*\\)", Lines)
[1] 1 2 3 4

You should test it on some real data to see if it works or needs to be  
tweaked further.

^[A-Za-z]+ finds one or more characters at the beginning of the line
.* finds zero or more characters after the word
\\( finds an open paren
.* finds zero or more characters after the open paren
[0-9]+ finds one or more digits
.* finds zero or more characters after the digits
\\) finds the close paren


HTH,

Marc Schwartz




More information about the R-help mailing list