[R-pkg-devel] Accessing R's linked PCRE library from inside a package
Oliver Keyes
ironholds at gmail.com
Thu Aug 11 01:35:43 CEST 2016
Neat; thanks Dirk! Will be interesting to see if I can get that finnagled
on Windows when I get back to Boston.
Best,
Oliver
On Wednesday, 10 August 2016, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 10 August 2016 at 18:15, Oliver Keyes wrote:
> | I'm trying to incorporate PCRE-compliant regular expressions into C
> | code in an R package.
> |
> | >From digging around in R's source code, it appears that R (pretty
> | much?) guarantees the presence of either a system-level PCRE library,
> | or an R-internal one.[0] Is this exposed (or grabbable) via the R C
> | API in any way?
>
> The key to realize here is that R does indeed provide an environment. And
> at
> least where I like to work, in get this right off the bat:
>
> edd at max:/tmp$ grep lpcre /etc/R/*
> /etc/R/Makeconf:LIBS = -lpcre -llzma -lbz2 -lz -lrt -ldl -lm
> edd at max:/tmp$
>
> So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are
> essentially "there for the taking". If built as a shared library.
>
> An existence proof is below; it is based on the 2nd Google hit I got for
> 'libpcre example' and has the advantge of being shorter than the first hit.
>
> I first created baseline. The example, as given and then repaired, gets us:
>
> edd at max:/tmp$ ./ex_pcre
> 0: From:regular.expressions at example.com <javascript:;>
> 1: regular.expressions
> 2: example.com
> 0: From:exddd at 43434.com <javascript:;>
> 1: exddd
> 2: 43434.com
> 0: From:7853456 at exgem.com <javascript:;>
> 1: 7853456
> 2: exgem.com
> edd at max:/tmp$
>
> Turning that into something callable from R took about another minute. It
> looks like this:
>
> ------------------------------------------------------------
> -----------------
> // modified (and repaired) example from http://stackoverflow.com/a/
> 1421923/143305
> #include "pcre.h"
> #include <Rcpp.h>
>
> // [[Rcpp::export()]]
> void foo() {
> const char *error;
> int erroffset;
> pcre *re;
> int rc;
> int i;
> int ovector[100];
>
> const char *regex = "From:([^@]+)@([^\r]+)";
> char str[] = "From:regular.expressions at example.com <javascript:;>
> \r\n"\
> "From:exddd at 43434.com <javascript:;>\r\n"\
> "From:7853456 at exgem.com <javascript:;>\r\n";
>
> re = pcre_compile (regex, /* the pattern */
> PCRE_MULTILINE,
> &error, /* for error message */
> &erroffset, /* for error offset */
> 0); /* use default character tables */
> if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n",
> erroffset, error);
>
> unsigned int offset = 0;
> unsigned int len = strlen(str);
> while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0,
> ovector, sizeof(ovector))) >= 0) {
> for(int i = 0; i < rc; ++i) {
> Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str +
> ovector[2*i]);
> }
> offset = ovector[1];
> }
> }
>
> /*** R
> foo()
> */
> ------------------------------------------------------------
> -----------------
>
> and, lo and behold, produces the same output demonstrating that, yes,
> Veronica, we do get pcre for free:
>
> R> library(Rcpp)
> R> sourceCpp("/tmp/oliver.cpp")
>
> R> foo()
> 0: From:regular.expressions at example.com <javascript:;>
> 1: regular.expressions
> 2: example.com
> 0: From:exddd at 43434.com <javascript:;>
> 1: exddd
> 2: 43434.com
> 0: From:7853456 at exgem.com <javascript:;>
> 1: 7853456
> 2: exgem.com
> R>
>
> Your package will probably want to a litmus test in configure to see if
> this
> really holds on the platform it is currently being built on.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
> <javascript:;>
>
>
[[alternative HTML version deleted]]
More information about the R-package-devel
mailing list