[R] Using large datasets: can I overload the subscript operator?

Roger Bivand Roger.Bivand at nhh.no
Sat Mar 10 08:43:18 CET 2007


On Sat, 10 Mar 2007, Maciej Radziejewski wrote:

> Hello,
> 

The http://www.met.rdg.ac.uk/cag/rclim/ site may have some useful leads. 
In addition, you'll find ideas in two packages created by Tim Keitt, 
rgdal, and Rdbi+RdbiPgSQL (now on Bioconductor). 

> I do some computations on datasets that come from climate models. These data
> are huge arrays, significantly larger than typically available RAM, so they
> have to be accessed row-by-row, or rather slice-by slice, depending on the
> task. I would like to make an R package to easily access such datasets
> within R. The C++ backend is ready and being used under Windows/.Net/Visual
> Basic, but I have yet to learn the specifics of R programming to make a good
> R interface.

Look at the Matrix package for examples - you may need finalizers to tidy 
up memory allocation - see examples in rgdal. The key thing will be 
thinking through how to implement the R objects as classes, probably not 
simply reflecting the C++ classes. Classes are covered in the Green Book 
(Chambers 1998) and Venables & Ripley (2000) S Programming.

> 
> I think it should be possible to make a package (call it "slice") that could
> be used like this:
> 
> library (slice)
> dataset <- load.virtualarray ("dataset_definition.xml")
> ordinaryvector <- dataset [ , 2, 3] # Load a portion of the data from disk
> and extract it
> 
> In the above "dataset" is an object that holds a definition of a
> 3-dimensional large dataset, and "ordinaryvector" is an ordinary R vector.
> The subscripting operator fetches necessary data from disk and extracts a
> required slice, taking care of caching and other technical details. So, my
> questions are:
> 
> Has anyone ever made a similar extension, with virtual (lazy) arrays?
> 
> Can the suscript operator be overloaded like that in R? (I know it can be in
> S, at least for vectors.)
> 

Yes, there are many examples, see the Matrix package for some that use 
new-style classes (in language issues like this, R is S, the differences 
are in scoping).

> And a tough one: is it possible to make an expression like "[1]" (without
> quoutes) meaningful in R? At the moment it results in a syntax error. I
> would like to make it return an object of a special class that gets
> interpreted when subscripting my virtual array as "drop this dimension",
> like this:

Most likely not in this context, because "[" in this context will not be
what you want. But if your "[.dataset" method is careful about examining
its arguments, you ought to be able to get the result you want. You'll
likely learn a good deal from looking for example at the code in the
Matrix package.

> 
> dataset [, 2, 3, drop = F]  # Return a 3-dimensional array
> dataset [, [2], 3, drop = F]  # Return a 2-dimensional array
> dataset [, [2], [3], drop = F]  # Return a 1-dimensional array, like dataset
> [, 2, 3]
> 
> Thanks in advance for any help,
> 
> Maciej.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-help mailing list