[R] Passing references to data objects into R functions

David Khabie-Zeitoune dave at evocapital.com
Wed Jul 23 18:17:41 CEST 2003


I have the following question about reading from large data objects from
within R functions; I have tried to simplify my problem as much as
possible in what follows.

Imagine I have various large data objects sitting in my global
environment (call them "data1", "data2", ...).  I want to write a
function "extract" that extracts some of the rows of a particular data
object, does some further manipulations on the extract and then returns
the result. The function takes the data object's name and an index
vector -- for example the following call would return the first 3 rows
of object data1. 

ans = extract("data1", 1:3)

I could write a simple function like this:

extract1 = function(object.name, index) {

    temp = get(object.name, envir = .GlobalEnv)
    temp = temp[index, , drop=FALSE]

    # do some further manipulations here ....



The problem is that the function makes a copy "temp" of the object in
the function frame, which (in my application) is very memory inefficient
as the data objects are very large. It is especially inefficient when
the length of the "index" vector is much smaller than the number of rows
in the data object. What I really would like to do is to be able to read
from the underlying data object directly (in other programming languages
this would be achieved by passing a pointer to the object instead),
without making a copy.

Given the rules of variable name scoping in R, I could avoid making a
copy with the following call:

extract2 = function(object.name, index) {

    eval(parse(text = "temp = ", object.name, "[index, , drop=FALSE]",
    # do some further manipulations here ....


But this seems very messy. Is there a better way?

Thanks for your help

David Khabie-Zeitoune

More information about the R-help mailing list