[R-pkgs] package bit64 with new functionality

Jens Oehlschlägel Jens.Oehlschlaegel at truecluster.com
Thu Nov 8 21:03:20 CET 2012


Dear R community,

The new version of package 'bit64' - which extends R with fast 64-bit 
integers - now has fast (single-threaded) implementations of the most 
important univariate algorithmic operations (those based on hashing and 
sorting). Package 'bit64' now has methods for 'match', '%in%', 
'duplicated', 'unique', 'table', 'sort', 'order', 'rank', 'quantile', 
'median' and 'summary'. Regarding data management it has novel generics 
'unipos' (positions of the unique values), 'tiepos' (positions of ties), 
'keypos' (positions of values in a sorted unique table) and derived 
methods 'as.factor' and 'as.ordered'. This 64-bit functionality is 
implemented carefully to be not slower than the respective 32-bit 
operations in Base R and also to avoid excessive execution times 
observed with 'order', 'rank' and 'table' (speedup factors 20/16/200 
respective). This increases the dataset size with wich we can work truly 
interactive. The speed is achieved by simple heuristic optimizers: the 
mentioned high-level functions choose the best from multiple low-level 
algorithms and further take advantage of a novel optional caching 
method. In an example R session using a couple of these operations the 
64-bit integers performed 22x faster than base 32-bit integers, 
hash-caching improved this to 24x amortized, sortorder-caching was most 
efficient with 38x (caching both, hashing and sorting is not worth it 
with 32x at duplicated RAM consumption).

Since the package covers the most important functions for (univariate) 
data exploration and data management, I think it is now appropriate to 
claim that R has sound 64-bit integer support, for example for working 
with keys or counts imported from large databases. For details 
concerning approach, implementation and roadmap please check the 
ANNOUNCEMENT-0.9-Details.txt file and the package help files.

Kind regards


Jens Oehlschlägel
Munich, 8.11.2012



More information about the R-packages mailing list