[R] unique possible bug

Patrick McCann patmmccann at gmail.com
Thu Oct 6 16:50:42 CEST 2011


The error I am referring to is in unique.c in Base R, it cannot
accomodate greater than 2^29 values, even though it appears the
overflow protection should be 2^30. The only relevance of the arules
package is I was using it while I discovered this issue.

Thanks,
Patrick



2011/10/6 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
>
>
> On 05.10.2011 22:15, Patrick McCann wrote:
>>
>> Hi,
>>
>> I am trying to read in a rather large list of transactions using the
>> arules library.
>
> You mean the arules package?
>
>
>> It seems in the coerce method into the dgCmatrix, it
>> somewhere calls unique. Unique.c throws an error when  n>  536870912;
>> however, when 4*n was modified to 2*n in 2004, the overflow protection
>> should have changed from 2^29 to 2^30, right? If so, how would I
>> change it in my copy? Do I have to recompile everything?
>
> Yes.
>
> There is also the way to ask the maintainer to improve it, but it won't work
> without reinstallation of the changed package sources.
>
> Uwe Ligges
>
>
>>
>> Thanks,
>> Patrick McCann
>>
>>
>> Here is a simple to reproduce example:
>>>
>>> runif(2^29+5)->a
>>> sum(unique(a))->b
>>
>> Error in unique.default(a) : length 536870917 is too large for hashing
>>>
>>> traceback()
>>
>> 3: unique.default(a)
>> 2: unique(a)
>> 1: unique(a)
>>>
>>> unique.default
>>
>> function (x, incomparables = FALSE, fromLast = FALSE, ...)
>> {
>>     z<- .Internal(unique(x, incomparables, fromLast))
>>     if (is.factor(x))
>>         factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
>>             ordered = is.ordered(x))
>>     else if (inherits(x, "POSIXct"))
>>         structure(z, class = class(x), tzone = attr(x, "tzone"))
>>     else if (inherits(x, "Date"))
>>         structure(z, class = class(x))
>>     else z
>> }
>> <environment: namespace:base>
>>
>>> From http://svn.r-project.org/R/trunk/src/main/unique.c I see:
>>
>>
>> /*
>>  Choose M to be the smallest power of 2
>>  not less than 2*n and set K = log2(M).
>>  Need K>= 1 and hence M>= 2, and 2^M<= 2^31 -1, hence n<= 2^29.
>>
>>  Dec 2004: modified from 4*n to 2*n, since in the worst case we have
>>  a 50% full table, and that is still rather efficient -- see
>>  R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
>> */
>> static void MKsetup(int n, HashData *d)
>> {
>>    int n4 = 2 * n;
>>
>>    if(n<  0 || n>  536870912) /* protect against overflow to -ve */
>>        error(_("length %d is too large for hashing"), n);
>>    d->M = 2;
>>    d->K = 1;
>>    while (d->M<  n4) {
>>        d->M *= 2;
>>        d->K += 1;
>>    }
>> }
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list