[R-pkg-devel] Accessing R's linked PCRE library from inside a package

Dirk Eddelbuettel edd at debian.org
Thu Aug 11 01:28:08 CEST 2016


On 10 August 2016 at 18:15, Oliver Keyes wrote:
| I'm trying to incorporate PCRE-compliant regular expressions into C
| code in an R package.
| 
| >From digging around in R's source code, it appears that R (pretty
| much?) guarantees the presence of either a system-level PCRE library,
| or an R-internal one.[0] Is this exposed (or grabbable) via the R C
| API in any way?

The key to realize here is that R does indeed provide an environment.  And at
least where I like to work, in get this right off the bat:

    edd at max:/tmp$ grep lpcre /etc/R/*
    /etc/R/Makeconf:LIBS =  -lpcre -llzma -lbz2 -lz -lrt -ldl -lm
    edd at max:/tmp$ 

So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are
essentially "there for the taking". If built as a shared library.

An existence proof is below; it is based on the 2nd Google hit I got for
'libpcre example' and has the advantge of being shorter than the first hit.

I first created baseline. The example, as given and then repaired, gets us:

    edd at max:/tmp$ ./ex_pcre 
     0: From:regular.expressions at example.com
     1: regular.expressions
     2: example.com
     0: From:exddd at 43434.com
     1: exddd
     2: 43434.com
     0: From:7853456 at exgem.com
     1: 7853456
     2: exgem.com
    edd at max:/tmp$ 

Turning that into something callable from R took about another minute. It
looks like this:

-----------------------------------------------------------------------------
// modified (and repaired) example from http://stackoverflow.com/a/1421923/143305
#include "pcre.h"
#include <Rcpp.h>

// [[Rcpp::export()]]
void foo() {
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[100];

    const char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:regular.expressions at example.com\r\n"\
                  "From:exddd at 43434.com\r\n"\
                  "From:7853456 at exgem.com\r\n";

    re = pcre_compile (regex,          /* the pattern */
                       PCRE_MULTILINE,
                       &error,         /* for error message */
                       &erroffset,     /* for error offset */
                       0);             /* use default character tables */
    if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n", erroffset, error);

    unsigned int offset = 0;
    unsigned int len    = strlen(str);
    while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0) {
        for(int i = 0; i < rc; ++i) {
            Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
        }
        offset = ovector[1];
    }
}

/*** R
foo()
*/
-----------------------------------------------------------------------------

and, lo and behold, produces the same output demonstrating that, yes,
Veronica, we do get pcre for free:

    R> library(Rcpp)
    R> sourceCpp("/tmp/oliver.cpp")
    
    R> foo()
     0: From:regular.expressions at example.com
     1: regular.expressions
     2: example.com
     0: From:exddd at 43434.com
     1: exddd
     2: 43434.com
     0: From:7853456 at exgem.com
     1: 7853456
     2: exgem.com
    R> 

Your package will probably want to a litmus test in configure to see if this
really holds on the platform it is currently being built on.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org



More information about the R-package-devel mailing list