[R] Using large datasets: can I overload the subscript operator?

Roy Mendelssohn Roy.Mendelssohn at noaa.gov
Sat Mar 10 04:01:11 CET 2007


Look at the netcdf packages.  A lot of output from climate models is  
in netcdf anyway.  It can take all sorts of slices and strides.

-Roy M.


On Mar 9, 2007, at 6:54 PM, Duncan Murdoch wrote:

> On 3/9/2007 6:47 PM, Maciej Radziejewski wrote:
>> Hello,
>>
>> I do some computations on datasets that come from climate models.  
>> These data
>> are huge arrays, significantly larger than typically available  
>> RAM, so they
>> have to be accessed row-by-row, or rather slice-by slice,  
>> depending on the
>> task. I would like to make an R package to easily access such  
>> datasets
>> within R. The C++ backend is ready and being used under  
>> Windows/.Net/Visual
>> Basic, but I have yet to learn the specifics of R programming to  
>> make a good
>> R interface.
>>
>> I think it should be possible to make a package (call it "slice")  
>> that could
>> be used like this:
>>
>> library (slice)
>> dataset <- load.virtualarray ("dataset_definition.xml")
>> ordinaryvector <- dataset [ , 2, 3] # Load a portion of the data  
>> from disk
>> and extract it
>>
>> In the above "dataset" is an object that holds a definition of a
>> 3-dimensional large dataset, and "ordinaryvector" is an ordinary R  
>> vector.
>> The subscripting operator fetches necessary data from disk and  
>> extracts a
>> required slice, taking care of caching and other technical  
>> details. So, my
>> questions are:
>>
>> Has anyone ever made a similar extension, with virtual (lazy) arrays?
>
> Yes, e.g. the SQLiteDF package.
>>
>> Can the suscript operator be overloaded like that in R? (I know it  
>> can be in
>> S, at least for vectors.)
>
> Yes.
>>
>> And a tough one: is it possible to make an expression like  
>> "[1]" (without
>> quoutes) meaningful in R? At the moment it results in a syntax  
>> error. I
>> would like to make it return an object of a special class that gets
>> interpreted when subscripting my virtual array as "drop this  
>> dimension",
>> like this:
>>
>> dataset [, 2, 3, drop = F]  # Return a 3-dimensional array
>> dataset [, [2], 3, drop = F]  # Return a 2-dimensional array
>> dataset [, [2], [3], drop = F]  # Return a 1-dimensional array,  
>> like dataset
>> [, 2, 3]
>
> No, that's not legal S or R syntax.  However, you might be able to
> define a special object D and use syntax like
>
> dataset [, D[2], 3, drop = F]
>
> Duncan Murdoch
>>
>> Thanks in advance for any help,
>>
>> Maciej.
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

**********************
"The contents of this message do not reflect any position of the U.S.  
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division	
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."



More information about the R-help mailing list