[R-pkgs] package bit64 with new functionality
Jens Oehlschlägel
Jens.Oehlschlaegel at truecluster.com
Thu Nov 8 21:03:20 CET 2012
Dear R community,
The new version of package 'bit64' - which extends R with fast 64-bit
integers - now has fast (single-threaded) implementations of the most
important univariate algorithmic operations (those based on hashing and
sorting). Package 'bit64' now has methods for 'match', '%in%',
'duplicated', 'unique', 'table', 'sort', 'order', 'rank', 'quantile',
'median' and 'summary'. Regarding data management it has novel generics
'unipos' (positions of the unique values), 'tiepos' (positions of ties),
'keypos' (positions of values in a sorted unique table) and derived
methods 'as.factor' and 'as.ordered'. This 64-bit functionality is
implemented carefully to be not slower than the respective 32-bit
operations in Base R and also to avoid excessive execution times
observed with 'order', 'rank' and 'table' (speedup factors 20/16/200
respective). This increases the dataset size with wich we can work truly
interactive. The speed is achieved by simple heuristic optimizers: the
mentioned high-level functions choose the best from multiple low-level
algorithms and further take advantage of a novel optional caching
method. In an example R session using a couple of these operations the
64-bit integers performed 22x faster than base 32-bit integers,
hash-caching improved this to 24x amortized, sortorder-caching was most
efficient with 38x (caching both, hashing and sorting is not worth it
with 32x at duplicated RAM consumption).
Since the package covers the most important functions for (univariate)
data exploration and data management, I think it is now appropriate to
claim that R has sound 64-bit integer support, for example for working
with keys or counts imported from large databases. For details
concerning approach, implementation and roadmap please check the
ANNOUNCEMENT-0.9-Details.txt file and the package help files.
Kind regards
Jens Oehlschlägel
Munich, 8.11.2012
More information about the R-packages
mailing list