[Rd] Most efficient way to check the length of a variable mentioned in a formula.

William Dunlap wdunlap at tibco.com
Fri Oct 17 22:16:35 CEST 2014


I got the default value for getRHSLength's data argument wrong - it
should be NULL, not parent.env().
   getRHSLength <- function (formula, data = NULL)
   {
       rhsExpr <- formula[[length(formula)]]
       rhsValue <- eval(rhsExpr, envir = data, enclos = environment(formula))
       length(rhsValue)
   }
so that the function firstHalf is found in the following
   > X <- 1:10
   > getRHSLength((function(){firstHalf<-function(x)x[seq_len(floor(length(x)/2))];
~firstHalf(X)})())
   [1] 5


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Oct 17, 2014 at 11:57 AM, William Dunlap <wdunlap at tibco.com> wrote:
> I would use eval(), but I think that most formula-using functions do
> it more like the following.
>
> getRHSLength <-
> function (formula, data = parent.frame())
> {
>     rhsExpr <- formula[[length(formula)]]
>     rhsValue <- eval(rhsExpr, envir = data, enclos = environment(formula))
>     length(rhsValue)
> }
>
> * use eval() instead of get() so you will find variables are in
> ancestral environments
> of envir (if envir is an environment), not just envir itself.
> * just evaluate the stuff in the formula using the non-standard
> evaluation frame,
> call length() in the current frame.  Otherwise, if  envir inherits
> directly from emptyenv() the 'length' function will not be found.
> * use envir=data so it looks first in the data argument for variables
> * the enclos argument is used if envir is not an environment and is used to
> find variables that are not in envir.
>
> Here are some examples:
>   > X <- 1:10
>   > getRHSLength(~X)
>   [1] 10
>   > getRHSLength(~X, data=data.frame(X=1:2))
>   [1] 2
>   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame())
>   [1] 4
>   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2))
>   [1] 2
>   > getRHSLength((function(){X <- 1:4; ~X})(), data=list2env(data.frame()))
>   [1] 10
>   > getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv())
>   Error in eval(expr, envir, enclos) : object 'X' not found
>
> I think you will see the same lookups if you try analogous things with lm().
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com> wrote:
>> Dear R gurus,
>>
>> I need to know the length of a variable (let's call that X) that is
>> mentioned in a formula. So obviously I look for the environment from which
>> the formula is called and then I have two options:
>>
>> - using eval(parse(text='length(X)'),
>>                     envir=environment(formula) )
>>
>> - using length(get('X'),
>>             envir=environment(formula) )
>>
>> a bit of benchmarking showed that the first option is about 20 times
>> slower, to that extent that if I repeat it 10,000 times I save more than
>> half a second. So speed is not really an issue here.
>>
>> Personally I'd go for option 2 as that one is easier to read and does the
>> job nicely, but with these functions I'm always a bit afraid that I'm
>> overseeing important details or side effects here (possibly memory issues
>> when working with larger data).
>>
>> Anybody an idea what the dangers are of these methods, and which one is the
>> most robust method?
>>
>> Thank you
>> Joris
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Mathematical Modelling, Statistics and Bio-Informatics
>>
>> tel : +32 9 264 59 87
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list