[R] What's the best way to tell a function about relevant fields in data frames

Titus von der Malsburg malsburg at gmail.com
Tue May 12 12:18:59 CEST 2009

Hi list,

I have a function that detects saccadic eye movements in a time series
of eye positions sampled at a rate of 250Hz.  This function needs
three vectors: x-coordinate, y-coordinate, trial-id.  This information
is usually contained in a data frame that also has some other fields.
The names of the fields are not standardized.

> head(eyemovements)
        time     x      y trial
51 880446504 53.18 375.73     1
52 880450686 53.20 375.79     1
53 880454885 53.35 376.14     1
54 880459060 53.92 376.39     1
55 880463239 54.14 376.52     1
56 880467426 54.46 376.74     1

There are now several possibilities for the signature of the function:

1. Passing the columns separately:

    detect(eyemovements$x, eyemovements$y, eyemovements$trial)


         detect(x, y, trial))

2. Passing the data frame plus the names of the fields:

    detect(eyemovements, "x", "y", "trial")

3. Passing the data frame plus a formula specifying the relevant

    detect(eyemovements, ~x+y|trial)

4. Passing a formula and getting the data from the environment:


I saw instances of all those variants (and others) in the wild.

Is there a canonical way to tell a function which fields in a data
frame are relevant?  What other alternatives are possible?  What are
the pros and cons of the alternatives?

Thanks, Titus

