[R] regexp inside and outside brackets
Marc Schwartz
marc_schwartz at me.com
Fri Dec 11 15:39:09 CET 2015
> On Dec 11, 2015, at 7:50 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
>
> For the regexp aficionados, out there:
>
> I need a regular expression to extract either everything within some
> brackets, or everything outside the brackets, in a string.
>
> This would be the test string:
> "A1{0}~B0{1} CO{a2}NN{12}"
>
> Everything outside the brackets would be:
>
> "A1 ~B0 CO NN"
>
> and everything inside the brackets would be:
>
> "0 1 a2 12"
>
> I have a working solution involving strsplit(), but I wonder if there is a
> more direct way.
> Thanks in advance for any hint,
> Adrian
x <- "A1{0}~B0{1} CO{a2}NN{12}"
The first is a bit easier:
> gsub("\\{[[:alnum:]]*\\}", " ", x)
[1] "A1 ~B0 CO NN "
The second, at least using standard functions, is a bit more cumbersome, given the repeated sequences:
> gsub("\\{|\\}", "", regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]])
[1] "0" "1" "a2" "12"
Note that a multi-element vector is returned.
In the above:
> gregexpr("\\{[[:alnum:]]+\\}", x)
[[1]]
[1] 3 9 15 21
attr(,"match.length")
[1] 3 3 4 4
attr(,"useBytes")
[1] TRUE
returns the starting positions of the matches, which are passed to regmatches() to get the actual values:
> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))
[[1]]
[1] "{0}" "{1}" "{a2}" "{12}"
The gsub() replaces the returned braces.
You could invert the result of regmatches() to get:
> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x), invert = TRUE)[[1]]
[1] "A1" "~B0" " CO" "NN" ""
Of course, this presumes non-nesting of braces, etc.
Regards,
Marc Schwartz
More information about the R-help
mailing list