[R-SIG-Finance] appending new data to a file with the mmap package

Daniel Cegiełka daniel.cegielka at gmail.com
Wed Jul 23 23:21:42 CEST 2014


2014-07-23 22:52 GMT+02:00 Claymore Marshall <claymorejmarshall at gmail.com>:
> This is about Jeff Ryan's mmap package (for which documentation is sparse).
>
> My problem: I'm wondering what an effective way to *append* to an existing
> mmapped file which is already "full of data" would be. e.g. simple case: if
> the file contains 100 rows of data (as represented by mmap in R), I want to
> add say another 5 rows of new data, giving a new total of 105 rows of data.
> Surely many others have dealt with this problem before.  A common case
> would be appending new tick data to a large file via mmap.
>
> A quick search on SO gives a few discussions about remapping (mremap?) the
> mmap in general, as a method for appending to an existing mmapped file, but
> I haven't found that functionality to be available in the R mmap package.

I'm not sure if mremap is portable (e.g. windows).


> One related approach/solution, but not quite the same, as shown very kindly
> by the author of the code at http://censix.com/, is to update to a
> preallocated mmap in a file with a very large number of rows prefilled with
> NAs. But this approach requires making new files/"volumes" as the existing
> files fill up.  Furthermore, I'm not quite sure if there is a clean way to
> aggregate data for a given security which might be split across multiple
> files.  For example, say I have tick data files for each month; one for
> Jan, one for feb, one for march, etc.  Now suppose I want to pull into R
> the tick data just for 29 May 2014 to 3 June 2014, which is split across 2
> files?  I have to load into memory both files (or portions of them with
> mmap), and merge the data.  It would seem it would just be easier to append
> to one file to begin with, rather than split across files.

This solution makes sense and is called partitioning and used for
large data., e.g. ticks in kdb+

http://code.kx.com/wiki/Cookbook/LoadingFromLargeFiles

I'm afraid that partitioning can hit performance etc.

I hope that there will be more ideas here...
Best regards,
Daniel



More information about the R-SIG-Finance mailing list