[R-pkg-devel] Accessing R's linked PCRE library from inside a package

Oliver Keyes ironholds at gmail.com
Thu Aug 11 01:35:43 CEST 2016


Neat; thanks Dirk! Will be interesting to see if I can get that finnagled
on Windows when I get back to Boston.

Best,
Oliver

On Wednesday, 10 August 2016, Dirk Eddelbuettel <edd at debian.org> wrote:

>
> On 10 August 2016 at 18:15, Oliver Keyes wrote:
> | I'm trying to incorporate PCRE-compliant regular expressions into C
> | code in an R package.
> |
> | >From digging around in R's source code, it appears that R (pretty
> | much?) guarantees the presence of either a system-level PCRE library,
> | or an R-internal one.[0] Is this exposed (or grabbable) via the R C
> | API in any way?
>
> The key to realize here is that R does indeed provide an environment.  And
> at
> least where I like to work, in get this right off the bat:
>
>     edd at max:/tmp$ grep lpcre /etc/R/*
>     /etc/R/Makeconf:LIBS =  -lpcre -llzma -lbz2 -lz -lrt -ldl -lm
>     edd at max:/tmp$
>
> So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are
> essentially "there for the taking". If built as a shared library.
>
> An existence proof is below; it is based on the 2nd Google hit I got for
> 'libpcre example' and has the advantge of being shorter than the first hit.
>
> I first created baseline. The example, as given and then repaired, gets us:
>
>     edd at max:/tmp$ ./ex_pcre
>      0: From:regular.expressions at example.com <javascript:;>
>      1: regular.expressions
>      2: example.com
>      0: From:exddd at 43434.com <javascript:;>
>      1: exddd
>      2: 43434.com
>      0: From:7853456 at exgem.com <javascript:;>
>      1: 7853456
>      2: exgem.com
>     edd at max:/tmp$
>
> Turning that into something callable from R took about another minute. It
> looks like this:
>
> ------------------------------------------------------------
> -----------------
> // modified (and repaired) example from http://stackoverflow.com/a/
> 1421923/143305
> #include "pcre.h"
> #include <Rcpp.h>
>
> // [[Rcpp::export()]]
> void foo() {
>     const char *error;
>     int   erroffset;
>     pcre *re;
>     int   rc;
>     int   i;
>     int   ovector[100];
>
>     const char *regex = "From:([^@]+)@([^\r]+)";
>     char str[]  = "From:regular.expressions at example.com <javascript:;>
> \r\n"\
>                   "From:exddd at 43434.com <javascript:;>\r\n"\
>                   "From:7853456 at exgem.com <javascript:;>\r\n";
>
>     re = pcre_compile (regex,          /* the pattern */
>                        PCRE_MULTILINE,
>                        &error,         /* for error message */
>                        &erroffset,     /* for error offset */
>                        0);             /* use default character tables */
>     if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n",
> erroffset, error);
>
>     unsigned int offset = 0;
>     unsigned int len    = strlen(str);
>     while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0,
> ovector, sizeof(ovector))) >= 0) {
>         for(int i = 0; i < rc; ++i) {
>             Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str +
> ovector[2*i]);
>         }
>         offset = ovector[1];
>     }
> }
>
> /*** R
> foo()
> */
> ------------------------------------------------------------
> -----------------
>
> and, lo and behold, produces the same output demonstrating that, yes,
> Veronica, we do get pcre for free:
>
>     R> library(Rcpp)
>     R> sourceCpp("/tmp/oliver.cpp")
>
>     R> foo()
>      0: From:regular.expressions at example.com <javascript:;>
>      1: regular.expressions
>      2: example.com
>      0: From:exddd at 43434.com <javascript:;>
>      1: exddd
>      2: 43434.com
>      0: From:7853456 at exgem.com <javascript:;>
>      1: 7853456
>      2: exgem.com
>     R>
>
> Your package will probably want to a litmus test in configure to see if
> this
> really holds on the platform it is currently being built on.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
> <javascript:;>
>
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list