[R] unique possible bug

Patrick McCann patmmccann at gmail.com
Wed Oct 5 22:15:18 CEST 2011


Hi,

I am trying to read in a rather large list of transactions using the
arules library. It seems in the coerce method into the dgCmatrix, it
somewhere calls unique. Unique.c throws an error when  n > 536870912;
however, when 4*n was modified to 2*n in 2004, the overflow protection
should have changed from 2^29 to 2^30, right? If so, how would I
change it in my copy? Do I have to recompile everything?

Thanks,
Patrick McCann


Here is a simple to reproduce example:
> runif(2^29+5)->a
> sum(unique(a))->b
Error in unique.default(a) : length 536870917 is too large for hashing
> traceback()
3: unique.default(a)
2: unique(a)
1: unique(a)
> unique.default
function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
    z <- .Internal(unique(x, incomparables, fromLast))
    if (is.factor(x))
        factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
            ordered = is.ordered(x))
    else if (inherits(x, "POSIXct"))
        structure(z, class = class(x), tzone = attr(x, "tzone"))
    else if (inherits(x, "Date"))
        structure(z, class = class(x))
    else z
}
<environment: namespace:base>

>From http://svn.r-project.org/R/trunk/src/main/unique.c I see:


/*
 Choose M to be the smallest power of 2
 not less than 2*n and set K = log2(M).
 Need K >= 1 and hence M >= 2, and 2^M <= 2^31 -1, hence n <= 2^29.

 Dec 2004: modified from 4*n to 2*n, since in the worst case we have
 a 50% full table, and that is still rather efficient -- see
 R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
*/
static void MKsetup(int n, HashData *d)
{
   int n4 = 2 * n;

   if(n < 0 || n > 536870912) /* protect against overflow to -ve */
       error(_("length %d is too large for hashing"), n);
   d->M = 2;
   d->K = 1;
   while (d->M < n4) {
       d->M *= 2;
       d->K += 1;
   }
}



More information about the R-help mailing list