[R] Regex exercise
Michael Hannon
jm_hannon at yahoo.com
Sat Aug 21 01:39:02 CEST 2010
> For regular expression afficianados, I'd like a cleverer solution to
> the following problem (my solution works just fine for my needs; I'm
> just trying to improve my regex skills):
>
> Given the string (entered, say, at a readline prompt):
>
> "1 2 -5, 3- 6 4 8 5-7 10" ## only integers will be entered
>
> parse it to produce the numeric vector:
>
> c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10)
>
> Note that "-" in the expression is used to indicate a range of values
> instead of ":"
>
> Here's my UNclever solution:
>
> First convert more than one space to a single space and then replace
> "<any spaces>-<any spaces>" by ":" by:
>
> > x1 <- gsub(" *- *",":",gsub(" +"," ",resp)) #giving
> > x1
> [1] "1 2:5, 3:6 4 8 5:7 10" ## Note that the comma remains
>
> Next convert the single string into a character vector via strsplit by
> splitting on anything but ":" or a digit:
>
> > x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]] #giving
> > x2
> [1] "1" "2:5" "3:6" "4" "8" "5:7" "10"
>
> Finally, parse() the vector, eval() each element, and unlist() the
> resulting list of numeric vectors:
>
> > unlist(lapply(parse(text=x2),eval)) #giving, as desired,
> [1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10
>
>
> This seems far too clumsy and circumlocuitous not to have a more
> elegant solution from a true regex expert.
>
> (Special note to Thomas Lumley: This seems one of the few instances
> where eval(parse..)) may actually be appropriate.)
Howdy. I don't know that I can produce anything less circumlocutory, but I
note that your "x2" form has a simple-enough structure that it can be further
parsed with regular expressions, i.e., as opposed to using parse and eval. I
don't know that this is an improvement -- just a variation on the theme.
I've appended an example.
-- Mike
#### Original vector
x <- "1 2 -5, 3- 6 4 8 5-7 10"; x
#### Convert ranges to standard R form
x1 <- gsub("[ ]*-[ ]*", ":", x); x1
#### Get rid of the comma
x2 <- gsub(",", " ", x1); x2
#### Remove extra spaces
x3 <- gsub("[ ]+", " ", x2); x3
#### Split off elements, now in standard form
x4 <- unlist(strsplit(x3, " ")); x4
#### Use regular expression for simple parse of elements
x5 <- sapply(x4, function(a) {
n1 <- gsub("([[:digit:]]):[[:digit:]]", "\\1", a)
n2 <- gsub("[[:digit:]]:([[:digit:]])", "\\1", a)
n1:n2}, USE.NAMES=FALSE); x5
x6 <- unlist(x5); x6
##########################################################
> #### Original vector
> x <- "1 2 -5, 3- 6 4 8 5-7 10"; x
[1] "1 2 -5, 3- 6 4 8 5-7 10"
>
> #### Convert ranges to standard R form
> x1 <- gsub("[ ]*-[ ]*", ":", x); x1
[1] "1 2:5, 3:6 4 8 5:7 10"
>
> #### Get rid of the comma
> x2 <- gsub(",", " ", x1); x2
[1] "1 2:5 3:6 4 8 5:7 10"
>
> #### Remove extra spaces
> x3 <- gsub("[ ]+", " ", x2); x3
[1] "1 2:5 3:6 4 8 5:7 10"
>
> #### Split off elements, now in standard form
> x4 <- unlist(strsplit(x3, " ")); x4
[1] "1" "2:5" "3:6" "4" "8" "5:7" "10"
>
> #### Use regular expression for simple parse of elements
> x5 <- sapply(x4, function(a) {
+ n1 <- gsub("([[:digit:]]):[[:digit:]]", "\\1", a)
+ n2 <- gsub("[[:digit:]]:([[:digit:]])", "\\1", a)
+ n1:n2}, USE.NAMES=FALSE); x5
[[1]]
[1] 1
[[2]]
[1] 2 3 4 5
[[3]]
[1] 3 4 5 6
[[4]]
[1] 4
[[5]]
[1] 8
[[6]]
[1] 5 6 7
[[7]]
[1] 10
> x6 <- unlist(x5); x6
[1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10
>
More information about the R-help
mailing list