[Rd] Arrays Partial unserialization

Jeff Ryan jeff.a.ryan at gmail.com
Fri Aug 31 17:01:57 CEST 2012


There is no such tool to my knowledge, though the mmap package can do
very similar things.  In fact, it will be able to do this exactly once
I apply a contributed patch to handle endianess.

The issue is that rds files are compressed by default, so directly
reading requires uncompressing, which makes subset selection not
possible, at least to the best of my knowledge of the compression
algorithms in use. (BDR's reply after this one clarifies)

What you can do though is writeBin by column, and read in
incrementally.  Take a look at the mmap package, specifically:

example(mmap)
example(struct)
example(types)

The struct one is quite useful for data.frame like structures on disk,
including the ability to modify struct padding etc.  This one is more
row oriented, so lets you store various types in row-oriented fashion
in one file.

?mmap.csv is an example function that will also let you read csv files
directly into an 'mmap' form - and shows the 'struct' functionality.

At some point I will write an article on all of this, but the vignette
for mmap is illustrative of most of the value.

The indexing package on R-forge (as well as talks about it given by me
at useR 2010 and R/Finance 2012) may also be of use - though that is
more 'database' rather than a more simplistic sequential stepping
through data on disk.

HTH
Jeff

On Fri, Aug 31, 2012 at 9:41 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 31/08/2012 9:47 AM, Damien Georges wrote:
>>
>> Hi all,
>>
>> I'm working with some huge array in R and I need to load several ones to
>> apply some functions that requires to have all my arrays values for each
>> cell...
>>
>> To make it possible, I would like to load only a part (for example 100
>> cells) of all my arrays, apply my function, delete all cells loaded,
>> loaded following cells and so on.
>>
>> Is it possible to unserialize (or load) only a defined part of an R array
>> ?
>> Do you know some tools that might help me?
>
>
> I don't know of any tools to do that, but there are tools to maintain large
> objects in files, and load only parts of them at a time, e.g. the ff
> package.  Or you could simply use readBin and writeBin to do the same
> yourself.
>
>>
>> Finally, I did lot of research to find the way array (and all other R
>> object) are serialized into binary object, but I found nothing
>> explaining really algorithms involved. If someone has some information
>> on this topic, I'm interesting in.
>
>
> You can read the source for this; it is in src/main/serialize.c.
>
> Duncan Murdoch
>
>>
>> Hoping my request is understandable,
>>
>> All the best,
>>
>> Damien.G
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Jeffrey Ryan
jeffrey.ryan at lemnica.com

www.lemnica.com



More information about the R-devel mailing list