[Rd] Bug in agrep computing edit distance?
Dickison, Daniel
ddickison at carnegielearning.com
Thu Nov 18 16:56:57 CET 2010
A followup to this. I got R to compile, and the following patch seems to
fix this issue (I don't think my previous attachment worked so it's pasted
inline).
There is still a quirk, where tail insertions seem to cost 1 extra and I'm
not sure why. In the first example below, 3 and 5 should match, and in
the second, 5 should match, but they don't unless max.distance=3:
> agrep("x", c("x", "y", "ax1", "abx", "x12", "ax12", "abx1"),
>max.distance=2)
[1] 1 2 4
> agrep("ax1", c("x", "y", "ax1", "abx", "x12", "ax12", "abx1"),
>max.distance=2)
[1] 1 3 4 6 7
In any case, I think this is more in line with the documentation. I'm
very new to hacking on R so please let me know if this isn't the right way
to submit patches...
Daniel
Index: src/library/base/R/grep.R
===================================================================
--- src/library/base/R/grep.R (revision 53625)
+++ src/library/base/R/grep.R (working copy)
@@ -93,6 +93,11 @@
n <- nchar(pattern, "c")
if(is.na(n)) stop("invalid multibyte string for 'pattern'")
+
+ ## make pattern match the whole string
+ pattern <- gsub("\\", "\\\\", pattern, fixed=TRUE)
+ pattern <- paste("^", pattern, "$", sep="")
+
if(!is.list(max.distance)) {
if(!is.numeric(max.distance) || (max.distance < 0))
stop("'max.distance' must be non-negative")
Index: src/main/agrep.c
===================================================================
--- src/main/agrep.c (revision 53625)
+++ src/main/agrep.c (working copy)
@@ -42,7 +42,7 @@
regex_t reg;
regaparams_t params;
regamatch_t match;
- int rc, cflags = REG_NOSUB | REG_LITERAL;
+ int rc, cflags = REG_NOSUB;
checkArity(op, args);
pat = CAR(args); args = CDR(args);
Daniel Dickison
Research Programmer
ddickison at carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444
Revolutionary Math Curricula. Revolutionary Results.
Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219
www.carnegielearning.com
More information about the R-devel
mailing list