[Rd] grep and PCRE fun

Jeffrey Horner jeffrey.horner at gmail.com
Thu Sep 29 23:00:42 CEST 2011


Hello,

I think I've found a bug in the C function do_grep located in
src/main/grep.c. It seems to affect both the latest revisions of
R-2-13-branch and trunk when compiling R without optimizations and
with it's own version of pcre located in src/extra, at least on ubuntu
10.04.

 According to the pcre_exec API (I presume the later versions), the
ovecsize argument must be a multiple of 3 , and the ovector argument
must point to a location that can hold at least ovecsize integers. All
the pcre_exec calls made by do_grep, save one, honors this. That one
call seems to overwrite areas of the stack it shouldn't. Here's the
smallest example I found that tickles the bug:

> grep("[^[:blank][:cntrl]]","\\n",perl=TRUE)
Error in grep("[^[:blank][:cntrl]]", "\\n", perl = TRUE) :
  negative length vectors are not allowed

As described above, this error occurs on ubuntu 10.04 when R is
compiled without optimizations ( I typically use CFLAGS="-ggdb"
CXXFLAGS="-ggdb" FFLAGS="-ggdb" ./configure --enable-R-shlib), and the
pcre_exec call executed from do_get overwrites the integer nmatches
and sets it to -1. This has the effect of making do_grep try and
allocate a results vector of length -1, which of course causes the
error message above.

I'd be interested to know if this bug happens on other platforms.

Below is my simple fix for R-2-13-branch (a similar fix works for
trunk as well).

Jeff

$ svn diff main/grep.c
Index: main/grep.c
===================================================================
--- main/grep.c	(revision 57110)
+++ main/grep.c	(working copy)
@@ -723,7 +723,7 @@
 {
     SEXP pat, text, ind, ans;
     regex_t reg;
-    int i, j, n, nmatches = 0, ov, rc;
+    int i, j, n, nmatches = 0, ov[3], rc;
     int igcase_opt, value_opt, perl_opt, fixed_opt, useBytes, invert;
     const char *spat = NULL;
     pcre *re_pcre = NULL /* -Wall */;
@@ -882,7 +882,7 @@
 	    if (fixed_opt)
 		LOGICAL(ind)[i] = fgrep_one(spat, s, useBytes, use_UTF8, NULL) >= 0;
 	    else if (perl_opt) {
-		if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, &ov, 0) >= 0)
+		if (pcre_exec(re_pcre, re_pe, s, strlen(s), 0, 0, ov, 3) >= 0)
 		    INTEGER(ind)[i] = 1;
 	    } else {
 		if (!use_WC)



More information about the R-devel mailing list