agrep {base} | R Documentation |

Searches for approximate matches to `pattern`

(the first argument)
within each element of the string `x`

(the second argument) using
the generalized Levenshtein edit distance (the minimal possibly
weighted number of insertions, deletions and substitutions needed to
transform one string into another).

agrep(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, value = FALSE, fixed = TRUE, useBytes = FALSE) agrepl(pattern, x, max.distance = 0.1, costs = NULL, ignore.case = FALSE, fixed = TRUE, useBytes = FALSE)

`pattern` |
a non-empty character string or a character string
containing a regular expression (for |

`x` |
character vector where matches are sought.
Coerced by |

`max.distance` |
Maximum distance allowed for a match. Expressed
either as integer, or as a fraction of the `cost` :maximum number/fraction of match cost (generalized Levenshtein distance) `all` :maximal number/fraction of *all*transformations (insertions, deletions and substitutions)`insertions` :maximum number/fraction of insertions `deletions` :maximum number/fraction of deletions `substitutions` :maximum number/fraction of substitutions
If |

`costs` |
a numeric vector or list with names partially matching
insertions, deletions and substitutions giving
the respective costs for computing the generalized Levenshtein
distance, or |

`ignore.case` |
if |

`value` |
if |

`fixed` |
logical. If |

`useBytes` |
logical. in a multibyte locale, should the comparison be character-by-character (the default) or byte-by-byte. |

The Levenshtein edit distance is used as measure of approximateness: it is the (possibly cost-weighted) total number of insertions, deletions and substitutions required to transform one string into another.

This uses `tre`

by Ville Laurikari
(http://laurikari.net/tre/), which supports MBCS
character matching.

The main effect of `useBytes`

is to avoid errors/warnings about
invalid inputs and spurious matches in multibyte locales.
It inhibits the conversion of inputs with marked encodings, and is
forced if any input is found which is marked as `"bytes"`

(see
`Encoding`

).

`agrep`

returns a vector giving the indices of the elements that
yielded a match, or, if `value`

is `TRUE`

, the matched
elements (after coercion, preserving names but no other attributes).

`agrepl`

returns a logical vector.

Since someone who read the description carelessly even filed a bug
report on it, do note that this matches substrings of each element of
`x`

(just as `grep`

does) and **not** whole
elements. See also `adist`

in package utils, which
optionally returns the offsets of the matched substrings.

Original version in **R** < 2.10.0 by David Meyer.
Current version by Brian Ripley and Kurt Hornik.

`grep`

, `adist`

.
A different interface to approximate string matching is provided by
`aregexec()`

.

agrep("lasy", "1 lazy 2") agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0)) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)

[Package *base* version 3.5.0 Index]