[R] Regex exercise
Bert Gunter
gunter.berton at gene.com
Fri Aug 20 22:55:27 CEST 2010
For regular expression afficianados, I'd like a cleverer solution to
the following problem (my solution works just fine for my needs; I'm
just trying to improve my regex skills):
Given the string (entered, say, at a readline prompt):
"1 2 -5, 3- 6 4 8 5-7 10" ## only integers will be entered
parse it to produce the numeric vector:
c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10)
Note that "-" in the expression is used to indicate a range of values
instead of ":"
Here's my UNclever solution:
First convert more than one space to a single space and then replace
"<any spaces>-<any spaces>" by ":" by:
> x1 <- gsub(" *- *",":",gsub(" +"," ",resp)) #giving
> x1
[1] "1 2:5, 3:6 4 8 5:7 10" ## Note that the comma remains
Next convert the single string into a character vector via strsplit by
splitting on anything but ":" or a digit:
> x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]] #giving
> x2
[1] "1" "2:5" "3:6" "4" "8" "5:7" "10"
Finally, parse() the vector, eval() each element, and unlist() the
resulting list of numeric vectors:
> unlist(lapply(parse(text=x2),eval)) #giving, as desired,
[1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10
This seems far too clumsy and circumlocuitous not to have a more
elegant solution from a true regex expert.
(Special note to Thomas Lumley: This seems one of the few instances
where eval(parse..)) may actually be appropriate.)
Cheers to all,
Bert
--
Bert Gunter
Genentech Nonclinical Biostatistics
More information about the R-help
mailing list