[R] Bug in agrep computing edit distance?
Dickison, Daniel
ddickison at carnegielearning.com
Wed Nov 17 00:47:06 CET 2010
The documentation for agrep says it uses the Levenshtein edit distance,
but it seems to get this wrong in certain cases when there is a
combination of deletions and substitutions. For example:
> agrep("abcd", "abcxyz", max.distance=1)
[1] 1
That should've been a no-match. The edit distance between those strings
is 3 (1 substitution, 2 deletions), but agrep matches with max.distance >=
1.
I didn't find anything in the bug database, so I was wondering if somehow
I'm misinterpreting how agrep works. If not, should I file this in
Bugzilla?
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.12.0
Daniel Dickison
Research Programmer
ddickison at carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444
Revolutionary Math Curricula. Revolutionary Results.
Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219
www.carnegielearning.com
More information about the R-help
mailing list