[Rd] idea for "virtual matrix/array" class

Thomas Lumley tlumley at u.washington.edu
Mon Aug 23 23:13:40 CEST 2004


On Mon, 23 Aug 2004, Tony Plate wrote:
>
> One idea I was thinking about was to have a new class of object that
> referred to data in a file on disk, and which had all the standard methods
> of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc.  The
> object in memory would only store the array attributes, while the actual
> array data (the elements) would reside in a file.  When some extraction
> method was called, it would access data in the file and return the
> appropriate data.  With sensible use of seek operations, the data access
> could probably be quite fast.  The file format of the object on disk could
> possibly be the standard serialized binary format as used in .RData
> files.  Of course, if the object was larger than would fit in memory, then
> trying to extract too large a subarray would exhaust memory, but it should
> be possible to efficiently extract reasonably sized subarrays.  To be more
> useful, one would want want apply() to work with such arrays.  That would
> be doable, either by creating a new method for apply, or possibly just for
> aperm.

This is what RPgSql does with proxy dataframes and what I did (read-only)
for netCDF access. It's a good idea if you have a data format for which
random access is fairly fast.  I'm not sure that the standard serialized
binary format satisfies this.  Fixed-format text files would work, but
free-format ones wouldn't -- seek() only helps when you can work out where
to seek without reading all the data.

	-thomas



More information about the R-devel mailing list