[Rd] idea for "virtual matrix/array" class
Thomas Lumley
tlumley at u.washington.edu
Mon Aug 23 23:13:40 CEST 2004
On Mon, 23 Aug 2004, Tony Plate wrote:
>
> One idea I was thinking about was to have a new class of object that
> referred to data in a file on disk, and which had all the standard methods
> of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc. The
> object in memory would only store the array attributes, while the actual
> array data (the elements) would reside in a file. When some extraction
> method was called, it would access data in the file and return the
> appropriate data. With sensible use of seek operations, the data access
> could probably be quite fast. The file format of the object on disk could
> possibly be the standard serialized binary format as used in .RData
> files. Of course, if the object was larger than would fit in memory, then
> trying to extract too large a subarray would exhaust memory, but it should
> be possible to efficiently extract reasonably sized subarrays. To be more
> useful, one would want want apply() to work with such arrays. That would
> be doable, either by creating a new method for apply, or possibly just for
> aperm.
This is what RPgSql does with proxy dataframes and what I did (read-only)
for netCDF access. It's a good idea if you have a data format for which
random access is fairly fast. I'm not sure that the standard serialized
binary format satisfies this. Fixed-format text files would work, but
free-format ones wouldn't -- seek() only helps when you can work out where
to seek without reading all the data.
-thomas
More information about the R-devel
mailing list