[Rd] idea for "virtual matrix/array" class
Barry Rowlingson
B.Rowlingson at lancaster.ac.uk
Tue Aug 24 10:54:12 CEST 2004
Thomas Lumley wrote:
> On Mon, 23 Aug 2004, Tony Plate wrote:
>
>>One idea I was thinking about was to have a new class of object that
>>referred to data in a file on disk, and which had all the standard methods
>>of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc.
>
> This is what RPgSql does with proxy dataframes and what I did (read-only)
> for netCDF access. It's a good idea if you have a data format for which
> random access is fairly fast. I'm not sure that the standard serialized
> binary format satisfies this. Fixed-format text files would work, but
> free-format ones wouldn't -- seek() only helps when you can work out where
> to seek without reading all the data.
Just to join in on the 'done it' threads here, this is what my Rmap
package does with DBF files (they are the database component of ESRI
Shapefile maps). I use the dbf library from shapelib to access a DBF
file just like a data frame.
My dbf objects keep track of selected rows and columns, from the
database file, so its possible to do:
db1 = db[1:10,]
and db1 is still a proxy object to the same DBF file as db, but with
attributes that tell it that it only has rows 1 to 10 in it. If you
really want a data frame, you just as.data.frame() it.
If you wanted to do this sort of thing for space-saving reasons you'd
have to be very careful, since for some operations R might slurp it all
into memory.
Baz
http://www.maths.lancs.ac.uk/Software/Rmap/
More information about the R-devel
mailing list