[Rd] idea for "virtual matrix/array" class

Barry Rowlingson B.Rowlingson at lancaster.ac.uk
Tue Aug 24 10:54:12 CEST 2004


Thomas Lumley wrote:
> On Mon, 23 Aug 2004, Tony Plate wrote:
> 
>>One idea I was thinking about was to have a new class of object that
>>referred to data in a file on disk, and which had all the standard methods
>>of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc. 
> 
> This is what RPgSql does with proxy dataframes and what I did (read-only)
> for netCDF access. It's a good idea if you have a data format for which
> random access is fairly fast.  I'm not sure that the standard serialized
> binary format satisfies this.  Fixed-format text files would work, but
> free-format ones wouldn't -- seek() only helps when you can work out where
> to seek without reading all the data.

  Just to join in on the 'done it' threads here, this is what my Rmap 
package does with DBF files (they are the database component of ESRI 
Shapefile maps). I use the dbf library from shapelib to access a DBF 
file just like a data frame.

  My dbf objects keep track of selected rows and columns, from the 
database file, so its possible to do:

  db1 = db[1:10,]

  and db1 is still a proxy object to the same DBF file as db, but with 
attributes that tell it that it only has rows 1 to 10 in it. If you 
really want a data frame, you just as.data.frame() it.

  If you wanted to do this sort of thing for space-saving reasons you'd 
have to be very careful, since for some operations R might slurp it all 
into memory.

Baz

http://www.maths.lancs.ac.uk/Software/Rmap/



More information about the R-devel mailing list