[R] What's the best way to tell a function about relevant fields in data frames

Duncan Murdoch murdoch at stats.uwo.ca
Tue May 12 12:55:54 CEST 2009


On 12/05/2009 6:18 AM, Titus von der Malsburg wrote:
> Hi list,
> 
> I have a function that detects saccadic eye movements in a time series
> of eye positions sampled at a rate of 250Hz.  This function needs
> three vectors: x-coordinate, y-coordinate, trial-id.  This information
> is usually contained in a data frame that also has some other fields.
> The names of the fields are not standardized.
> 
>> head(eyemovements)
>         time     x      y trial
> 51 880446504 53.18 375.73     1
> 52 880450686 53.20 375.79     1
> 53 880454885 53.35 376.14     1
> 54 880459060 53.92 376.39     1
> 55 880463239 54.14 376.52     1
> 56 880467426 54.46 376.74     1
> 
> There are now several possibilities for the signature of the function:
> 
> 1. Passing the columns separately:
> 
>     detect(eyemovements$x, eyemovements$y, eyemovements$trial)
> 
>   or:
> 
>     with(eyemovements,
>          detect(x, y, trial))

I'd choose this one, with one modification described below.

> 
> 2. Passing the data frame plus the names of the fields:
> 
>     detect(eyemovements, "x", "y", "trial")

I think this is too inflexible.  What if you want to temporarily change 
one variable?  You don't want to have to create a whole new dataframe, 
it's better to just substitute in another variable.

> 
> 3. Passing the data frame plus a formula specifying the relevant
> fields:
> 
>     detect(eyemovements, ~x+y|trial)
> 
> 4. Passing a formula and getting the data from the environment:
> 
>     with(eyemovements,
>          detect(~x+y|trial))

Rather than 3 or 4, I would use the more common idiom

detect(~x+y|trial, data=eyemovements)

(and the formula might be x+y~trial).  But I think the formula interface 
is too general for your needs.  What would ~x+y+z|trial mean?

I'd suggest something like 1 but using the convention plot.default() 
uses, where you have x and y arguments, but y can be skipped if x is a 
matrix/dataframe/formula/list. It uses the xy.coords() function to do 
the extraction.

Duncan Murdoch

> 
> I saw instances of all those variants (and others) in the wild.
> 
> Is there a canonical way to tell a function which fields in a data
> frame are relevant?  What other alternatives are possible?  What are
> the pros and cons of the alternatives?
> 
> Thanks, Titus
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list