[R] Using large datasets: can I overload the subscript operator?

Duncan Murdoch murdoch at stats.uwo.ca
Sat Mar 10 03:54:16 CET 2007


On 3/9/2007 6:47 PM, Maciej Radziejewski wrote:
> Hello,
> 
> I do some computations on datasets that come from climate models. These data
> are huge arrays, significantly larger than typically available RAM, so they
> have to be accessed row-by-row, or rather slice-by slice, depending on the
> task. I would like to make an R package to easily access such datasets
> within R. The C++ backend is ready and being used under Windows/.Net/Visual
> Basic, but I have yet to learn the specifics of R programming to make a good
> R interface.
> 
> I think it should be possible to make a package (call it "slice") that could
> be used like this:
> 
> library (slice)
> dataset <- load.virtualarray ("dataset_definition.xml")
> ordinaryvector <- dataset [ , 2, 3] # Load a portion of the data from disk
> and extract it
> 
> In the above "dataset" is an object that holds a definition of a
> 3-dimensional large dataset, and "ordinaryvector" is an ordinary R vector.
> The subscripting operator fetches necessary data from disk and extracts a
> required slice, taking care of caching and other technical details. So, my
> questions are:
> 
> Has anyone ever made a similar extension, with virtual (lazy) arrays?

Yes, e.g. the SQLiteDF package.
> 
> Can the suscript operator be overloaded like that in R? (I know it can be in
> S, at least for vectors.)

Yes.
> 
> And a tough one: is it possible to make an expression like "[1]" (without
> quoutes) meaningful in R? At the moment it results in a syntax error. I
> would like to make it return an object of a special class that gets
> interpreted when subscripting my virtual array as "drop this dimension",
> like this:
> 
> dataset [, 2, 3, drop = F]  # Return a 3-dimensional array
> dataset [, [2], 3, drop = F]  # Return a 2-dimensional array
> dataset [, [2], [3], drop = F]  # Return a 1-dimensional array, like dataset
> [, 2, 3]

No, that's not legal S or R syntax.  However, you might be able to 
define a special object D and use syntax like

dataset [, D[2], 3, drop = F]

Duncan Murdoch
> 
> Thanks in advance for any help,
> 
> Maciej.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list