[R] Regex exercise
Greg Snow
Greg.Snow at imail.org
Mon Aug 23 22:21:36 CEST 2010
How about:
x <- "1 2 -5, 3- 6 4 8 5-7 10"; x
library(gsubfn)
strapply( x, '(([0-9]+) *- *([0-9]+))|([0-9]+)',
function(one,two,three,four) {
if( nchar(four) > 0 ) return(as.numeric(four) )
return( seq( from=as.numeric(two), to=as.numeric(three) ) )
}
)[[1]]
If x is a vector of strings and you remove the [[1]] then you will get a list with each element corresponding to a string in x (unlisting will give a single vector).
This could be easily extended to handle floating point numbers instead of just integers and even negative numbers (as long as you have a clear rule to distinguish between a negative and a the end of the range).
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Bert Gunter
> Sent: Friday, August 20, 2010 2:55 PM
> To: r-help at r-project.org
> Subject: [R] Regex exercise
>
> For regular expression afficianados, I'd like a cleverer solution to
> the following problem (my solution works just fine for my needs; I'm
> just trying to improve my regex skills):
>
> Given the string (entered, say, at a readline prompt):
>
> "1 2 -5, 3- 6 4 8 5-7 10" ## only integers will be entered
>
> parse it to produce the numeric vector:
>
> c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10)
>
> Note that "-" in the expression is used to indicate a range of values
> instead of ":"
>
> Here's my UNclever solution:
>
> First convert more than one space to a single space and then replace
> "<any spaces>-<any spaces>" by ":" by:
>
> > x1 <- gsub(" *- *",":",gsub(" +"," ",resp)) #giving
> > x1
> [1] "1 2:5, 3:6 4 8 5:7 10" ## Note that the comma remains
>
> Next convert the single string into a character vector via strsplit by
> splitting on anything but ":" or a digit:
>
> > x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]] #giving
> > x2
> [1] "1" "2:5" "3:6" "4" "8" "5:7" "10"
>
> Finally, parse() the vector, eval() each element, and unlist() the
> resulting list of numeric vectors:
>
> > unlist(lapply(parse(text=x2),eval)) #giving, as desired,
> [1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10
>
>
> This seems far too clumsy and circumlocuitous not to have a more
> elegant solution from a true regex expert.
>
> (Special note to Thomas Lumley: This seems one of the few instances
> where eval(parse..)) may actually be appropriate.)
>
> Cheers to all,
>
> Bert
>
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list