[R-SIG-Finance] appending new data to a file with the mmap package
Daniel Cegiełka
daniel.cegielka at gmail.com
Wed Jul 23 23:21:42 CEST 2014
2014-07-23 22:52 GMT+02:00 Claymore Marshall <claymorejmarshall at gmail.com>:
> This is about Jeff Ryan's mmap package (for which documentation is sparse).
>
> My problem: I'm wondering what an effective way to *append* to an existing
> mmapped file which is already "full of data" would be. e.g. simple case: if
> the file contains 100 rows of data (as represented by mmap in R), I want to
> add say another 5 rows of new data, giving a new total of 105 rows of data.
> Surely many others have dealt with this problem before. A common case
> would be appending new tick data to a large file via mmap.
>
> A quick search on SO gives a few discussions about remapping (mremap?) the
> mmap in general, as a method for appending to an existing mmapped file, but
> I haven't found that functionality to be available in the R mmap package.
I'm not sure if mremap is portable (e.g. windows).
> One related approach/solution, but not quite the same, as shown very kindly
> by the author of the code at http://censix.com/, is to update to a
> preallocated mmap in a file with a very large number of rows prefilled with
> NAs. But this approach requires making new files/"volumes" as the existing
> files fill up. Furthermore, I'm not quite sure if there is a clean way to
> aggregate data for a given security which might be split across multiple
> files. For example, say I have tick data files for each month; one for
> Jan, one for feb, one for march, etc. Now suppose I want to pull into R
> the tick data just for 29 May 2014 to 3 June 2014, which is split across 2
> files? I have to load into memory both files (or portions of them with
> mmap), and merge the data. It would seem it would just be easier to append
> to one file to begin with, rather than split across files.
This solution makes sense and is called partitioning and used for
large data., e.g. ticks in kdb+
http://code.kx.com/wiki/Cookbook/LoadingFromLargeFiles
I'm afraid that partitioning can hit performance etc.
I hope that there will be more ideas here...
Best regards,
Daniel
More information about the R-SIG-Finance
mailing list