[R-sig-hpc] Big Data packages
Brian G. Peterson
brian at braverock.com
Thu Mar 18 17:10:48 CET 2010
I think if you are looking for a matrix-like replacement, you should
probably look at Jeff Ryan (author of xts, quantmod, others)
'indexing' package. It is very 'R-like' in its usage and subsetting,
holding the 'index' in memory. It turns out to be faster than bigmemory
for most types of access.
- Brian
Andrew Piskorski wrote:
> On Wed, Mar 17, 2010 at 04:26:16PM -0400, Ashwin Kapur wrote:
>
>> Just wondering if anyone has opinions on the various big data packages for
>> R, ff vs bigmemory vs anything else. Is anyone working on or is there
>>
>
> I don't really know. However, since both ff and bigmemory are
> intended for use with giant larger-than-RAM matrices via memory-mapped
> files on disk, back c. October 2009 I briefly tried out both in order
> to answer one question:
>
> Is either package a straightforward drop-in replacement for EXISTING
> code manipulating large R matrices, in order to reduce R's massive
> (and probably quite inefficient) memory use in such cases?
>
> The short answer is no, they're not. Neither one even really attempts
> to work transparently as a matrix in R. Both packages have major
> quirks and special behaviors which in practice seem to mean that you
> must write your code specifically for them. These include smaller
> things like is.na() or apply() not working, to conceptually bigger
> ones like pass-by-reference rather than the pass-by-value R uses
> everywhere else.
>
> And if you're writing special-case code, then other tools, like
> RSQLite or perhaps even Metakit, also become options. Note that I
> have no particular opinion on how useful ff or bigmemory are in
> general, I didn't even attempt to figure that out.
>
> And finally, some other out-there technologies to keep an eye on for
> potential use in massive data manipulation in R (but unlike the
> packages above, these probably are not usable with R right now):
>
> - If completed, Jean-Claude Wippler's Vlerq might well have been very
> useful for R, perhaps even as a unification of and upgrade to R's
> native matrix, array, and data frame data structures. Unfortunately
> that project is dead. It also sounded in some ways like what Kdb/Q do.
>
> - MonetDB is interesting, but may be too server-like for embedded use
> from R.
>
> - Alex van Ballegooij's "RAM" Relational Array Mapping extension for
> MonetDB sounds potentially relevant for R-like use of matrices, but
> it's not clear whether it actually worked for anything other than
> his PhD thesis.
> http://www.cwi.nl/en/2009/1026/New-array-database-technology-for-scientists
>
> - If SciDB gets anywhere, it might end up useful as an out-of-core
> multi-dimensional matrix back-end for R, even though it is intended
> more as an RDBMS server rather than a lightweight library.
>
>
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
More information about the R-sig-hpc
mailing list