[R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number
Greg Snow
Greg.Snow at imail.org
Tue Jun 9 18:26:50 CEST 2009
Here is one way using a single pattern (so can be used in a substitution), it uses Perl's positive look ahead patters:
> test <- c("SHRT","5HRT","M1TCH","M1TCH5","LONG3RS","NONUMBER","TOOLOOOONGG","ooops.3")
>
> sub( '(?=[a-zA-Z]{0,8}[0-9])[a-zA-Z0-9]{5,9}', 'xxx', test, perl=TRUE)
[1] "SHRT" "5HRT" "xxx" "xxx" "xxx"
[6] "NONUMBER" "TOOLOOOONGG" "ooops.3"
>
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Marc Schwartz
> Sent: Monday, June 08, 2009 6:33 PM
> To: Barry Rowlingson
> Cc: r-help at r-project.org; Tan, Richard
> Subject: Re: [R] Regex question to find a string that contains 5-9
> alpha-numeric characters, at least one of which is a number
>
>
> On Jun 8, 2009, at 5:27 PM, Barry Rowlingson wrote:
>
> > On Mon, Jun 8, 2009 at 10:40 PM, Tan, Richard<RTan at panagora.com>
> > wrote:
> >> Hi,
> >>
> >> This is not exactly an R question but I am trying to use gsub to
> >> replace
> >> a string that contains 5-9 alpha-numeric characters, at least one of
> >> which is a number. Is there a good way to write it in a one line
> >> regex?
> >
> > The only way I can think of is to spell out all the possible
> > expressions, somethinglike:
> >
> > [0-9][a-z0-9]{4} | [a-z0-9][0-9][a-z0-9]{3} |
> > [a-z0-9]{2}[0-9][a-z0-9]{2} .... and so on. That is, have a regex
> > component for every possible 5, 6, 7, 8, and 9 character expression
> > with [0-9] in each place. I'm not sure this qualifies as 'good',
> > though..
> >
> > Better to do it in two stages, one to check for 5-9 alphanumerics,
> > and then another to check for a number.
> >
> > Here's something on a test vector 's':
> >
> >> cbind(s,grepl("^[A-Z0-9]{5,9}$",s),grepl("[0-9]",s))
> > s
> > [1,] "SHRT" "FALSE" "FALSE"
> > [2,] "5HRT" "FALSE" "TRUE"
> > [3,] "M1TCH" "TRUE" "TRUE"
> > [4,] "M1TCH5" "TRUE" "TRUE"
> > [5,] "LONG3RS" "TRUE" "TRUE"
> > [6,] "NONUMBER" "TRUE" "FALSE"
> > [7,] "TOOLOOOONGG" "FALSE" "FALSE"
> >
> > The ones you want give two TRUE values. Extending to lower-case is
> > left as an exercise...
> >
> > Barry
>
>
> I was trying to think of a way to do this with only a single grep(),
> but it has been too long of a day.
>
> So here is a bit of a simplification on the two stage approach:
>
> > vec
> [1] "SHRT" "5HRT" "M1TCH" "M1TCH5"
> "LONG3RS" "NONUMBER" "TOOLOOOONGG"
>
>
> > grep("[0-9]", vec[grep("^[[:alnum:]]{5,9}$", vec)], value = TRUE)
> [1] "M1TCH" "M1TCH5" "LONG3RS"
>
>
> HTH,
>
> Marc Schwartz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list