[Rd] Pb with agrep()

Martin Maechler maechler at stat.math.ethz.ch
Thu Jan 5 10:02:21 CET 2006

>>>>> "Herve" == Herve Pages <hpages at fhcrc.org>
>>>>>     on Wed, 04 Jan 2006 17:29:35 -0800 writes:

    Herve> Happy new year everybody,
    Herve> I'm getting the following while trying to use the agrep() function:

    >> pattern <- "XXX"
    >> subject <- c("oooooo", "oooXooo", "oooXXooo", "oooXXXooo")
    >> max <- list(ins=0, del=0, sub=0) # I want exact matches only
    >> agrep(pattern, subject, max=max)
    Herve> [1] 4

    Herve> OK

    >> max$sub <- 1 # One allowed substitution
    >> agrep(pattern, subject, max=max)
    Herve> [1] 3 4

    Herve> OK

    >> max$sub <- 2 # Two allowed substitutions
    >> agrep(pattern, subject, max=max)
    Herve> [1] 3 4

    Herve> Wrong!

You have overlooked the fact that 'max.distance = 0.1' (10%) 
*remains* the default, even when 'max.distance' is specified as
a list as in your example [from  "?agrep" ] :

>> max.distance: Maximum distance allowed for a match.  Expressed either
>>           as integer, or as a fraction of the pattern length (will be
>>           replaced by the smallest integer not less than the
>>           corresponding fraction), or a list with possible components
>>           'all': maximal (overall) distance
>>           'insertions': maximum number/fraction of insertions
>>           'deletions': maximum number/fraction of deletions
>>           'substitutions': maximum number/fraction of substitutions
>>>>>>       If 'all' is missing, it is set to 10%, the other components
>>>>>>       default to 'all'.  The component names can be abbreviated. 

If you specify max$all as "100%", i.e, as 0.9999  ('< 1' !)  everything works
as you expect it:

agrep(pattern, subject, max = list(ins=0, del=0, sub= 2, all = 0.9999))
## --> 2 3 4

Martin Maechler, ETH Zurich

More information about the R-devel mailing list