[Rd] Mksetup() limited to hashing with 32 bits

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Feb 1 11:35:00 CET 2010


On Wed, 13 Jan 2010, Benjamin Tyner wrote:

> The MKsetup() in unique.c throws an error if the vector to be hashed is 
> longer than (2^32)/8:
>
>   if(n < 0 || n > 536870912) /* protect against overflow to -ve */
>       error(_("length %d is too large for hashing"), n);
>
> I occasionally work with vectors longer than this on 64-bit builds. Would it 
> be too much to ask that R can take advantage of all 64 bits for hashing when 
> compiled as such?

'All 64 bits' of what?  All systems we use have 64 bit integer types, 
but there are good reasons not to use them where not needed, and 'int' 
is not 64-bit on any R platform.  I don't see the connection to 64-bit 
pointers, which is what is most often meant by a '64-bit build'.

Efficiency would be a major consideration with such long vectors. 
What type(s) are you contemplating, and are they full of duplicates? 
If the latter, we could simply allow K=29.  Otherwise likely a new 
approach would be needed.

I think the way forward is for you to do some experiments and submit 
proposed code changes with supporting evidence.  (It seems only you is 
interested.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list