[R-pkg-devel] Accessing R's linked PCRE library from inside a package
Dirk Eddelbuettel
edd at debian.org
Thu Aug 11 01:28:08 CEST 2016
On 10 August 2016 at 18:15, Oliver Keyes wrote:
| I'm trying to incorporate PCRE-compliant regular expressions into C
| code in an R package.
|
| >From digging around in R's source code, it appears that R (pretty
| much?) guarantees the presence of either a system-level PCRE library,
| or an R-internal one.[0] Is this exposed (or grabbable) via the R C
| API in any way?
The key to realize here is that R does indeed provide an environment. And at
least where I like to work, in get this right off the bat:
edd at max:/tmp$ grep lpcre /etc/R/*
/etc/R/Makeconf:LIBS = -lpcre -llzma -lbz2 -lz -lrt -ldl -lm
edd at max:/tmp$
So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are
essentially "there for the taking". If built as a shared library.
An existence proof is below; it is based on the 2nd Google hit I got for
'libpcre example' and has the advantge of being shorter than the first hit.
I first created baseline. The example, as given and then repaired, gets us:
edd at max:/tmp$ ./ex_pcre
0: From:regular.expressions at example.com
1: regular.expressions
2: example.com
0: From:exddd at 43434.com
1: exddd
2: 43434.com
0: From:7853456 at exgem.com
1: 7853456
2: exgem.com
edd at max:/tmp$
Turning that into something callable from R took about another minute. It
looks like this:
-----------------------------------------------------------------------------
// modified (and repaired) example from http://stackoverflow.com/a/1421923/143305
#include "pcre.h"
#include <Rcpp.h>
// [[Rcpp::export()]]
void foo() {
const char *error;
int erroffset;
pcre *re;
int rc;
int i;
int ovector[100];
const char *regex = "From:([^@]+)@([^\r]+)";
char str[] = "From:regular.expressions at example.com\r\n"\
"From:exddd at 43434.com\r\n"\
"From:7853456 at exgem.com\r\n";
re = pcre_compile (regex, /* the pattern */
PCRE_MULTILINE,
&error, /* for error message */
&erroffset, /* for error offset */
0); /* use default character tables */
if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n", erroffset, error);
unsigned int offset = 0;
unsigned int len = strlen(str);
while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0) {
for(int i = 0; i < rc; ++i) {
Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
}
offset = ovector[1];
}
}
/*** R
foo()
*/
-----------------------------------------------------------------------------
and, lo and behold, produces the same output demonstrating that, yes,
Veronica, we do get pcre for free:
R> library(Rcpp)
R> sourceCpp("/tmp/oliver.cpp")
R> foo()
0: From:regular.expressions at example.com
1: regular.expressions
2: example.com
0: From:exddd at 43434.com
1: exddd
2: 43434.com
0: From:7853456 at exgem.com
1: 7853456
2: exgem.com
R>
Your package will probably want to a litmus test in configure to see if this
really holds on the platform it is currently being built on.
Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
More information about the R-package-devel
mailing list