[Rd] Fix for bug in match()

Andreas Borg borg at imbei.uni-mainz.de
Mon Jan 18 13:30:51 CET 2010


Hello all,

I posted the following bug last week:

# These calls work correctly:
match(c("A", "B", "C"), c("A","C"), incomparables=NA) # okay
match(c("A", "B", "C"), "A") # okay
match("A", c("A", "B"), incomparables=NA) # okay

# This one causes R to hang:
match(c("A", "B", "C"), "A", incomparables=NA)

I found the reason in the hash table implementation in unique.c. Values 
in the table argument of match are stored in a hash table. Incomparables 
are then removed by (potentially multiple) calls to removeEntry():

static void removeEntry(SEXP table, SEXP x, int indx, HashData *d)
{
    int i, *h;

    h = INTEGER(d->HashTable);
    i = d->hash(x, indx, d);
    while (h[i] != NIL) {
    if (d->equal(table, h[i], x, indx)) {
        h[i] = NA_INTEGER;  /* < 0, only index values are inserted */
        return;
    }
    i = (i + 1) % d->M;
    }
    h[i] = NA_INTEGER;
}

Removing a value x sets the corresponding cell to NA_INTEGER. If x is 
not element of the table, the cell where it would have been is set from 
NIL (-1) to NA_INTEGER. Therefore, subsequent calls to removeEntry() 
with values that are not element of the table can remove all initial NIL 
values from the table cells. Another call of removeEntry() or Lookup() 
then leads to an infinte loop because the condition

while (h[i] != NIL)

is never false.

As a fix, I propose to reset cells to NIL when removing values, which 
leads to the following definition:

static void removeEntry(SEXP table, SEXP x, int indx, HashData *d)
{
    int i, *h;

    h = INTEGER(d->HashTable);
    i = d->hash(x, indx, d);
    while (h[i] != NIL) {
    if (d->equal(table, h[i], x, indx)) {
        h[i] = NIL;  /* < 0, only index values are inserted */
        return;
    }
    i = (i + 1) % d->M;
    }
}

I would have submitted a patch file, but I couldn't checkout from the 
svn repository.

Cheers,

Andreas





-- 
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: borg at imbei.uni-mainz.de

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.



More information about the R-devel mailing list