[R-pkgs] Package "ore": Oniguruma Regular Expressions

Jon Clayden code at clayden.org
Thu Jan 8 16:47:04 CET 2015


Dear all,

I'm pleased to announce the availability of the "ore" package (for
"Oniguruma Regular Expressions"), which offers an alternative to base
R's functions for searching, splitting and substituting text which
matches (Perl-style) regular expressions. The package uses the
Oniguruma/Onigmo regex library behind the scenes, and offers the
following advantages:

- Regular expressions are themselves first-class objects (of class
"ore"), stored with attributes containing information such as the number
of parenthesised groups present within them. This means that it is not
necessary to compile a particular regex more than once.
- Search results focus around the matched substrings (including
parenthesised groups), rather than the locations of matches. This saves
extra work with "substr" or similar to extract the matches themselves.
- Substantially better performance, especially when matching against
long strings.
- Substitutions can be functions as well as strings.
- Matches can be efficiently obtained over only part of the strings.
- Fewer core functions, with more consistent names.

The package is developed using GitHub, and more information can be found
at <https://github.com/jonclayden/ore>. It is also available on CRAN.

In addition, I've developed a regular expression benchmark, which pits
"ore", "base" and the "stringi" package against the full text of The
Adventures of Sherlock Holmes. It can be obtained and run via
<https://github.com/jonclayden/regex-performance>. Sample output is
available at <http://rpubs.com/jonclayden/regex-performance>.

Contributions to the package or the benchmark are very welcome.

All the best,
Jon



More information about the R-packages mailing list