[Rd] Most efficient way to check the length of a variable mentioned in a formula.
William Dunlap
wdunlap at tibco.com
Fri Oct 17 20:57:30 CEST 2014
I would use eval(), but I think that most formula-using functions do
it more like the following.
getRHSLength <-
function (formula, data = parent.frame())
{
rhsExpr <- formula[[length(formula)]]
rhsValue <- eval(rhsExpr, envir = data, enclos = environment(formula))
length(rhsValue)
}
* use eval() instead of get() so you will find variables are in
ancestral environments
of envir (if envir is an environment), not just envir itself.
* just evaluate the stuff in the formula using the non-standard
evaluation frame,
call length() in the current frame. Otherwise, if envir inherits
directly from emptyenv() the 'length' function will not be found.
* use envir=data so it looks first in the data argument for variables
* the enclos argument is used if envir is not an environment and is used to
find variables that are not in envir.
Here are some examples:
> X <- 1:10
> getRHSLength(~X)
[1] 10
> getRHSLength(~X, data=data.frame(X=1:2))
[1] 2
> getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame())
[1] 4
> getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2))
[1] 2
> getRHSLength((function(){X <- 1:4; ~X})(), data=list2env(data.frame()))
[1] 10
> getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv())
Error in eval(expr, envir, enclos) : object 'X' not found
I think you will see the same lookups if you try analogous things with lm().
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com> wrote:
> Dear R gurus,
>
> I need to know the length of a variable (let's call that X) that is
> mentioned in a formula. So obviously I look for the environment from which
> the formula is called and then I have two options:
>
> - using eval(parse(text='length(X)'),
> envir=environment(formula) )
>
> - using length(get('X'),
> envir=environment(formula) )
>
> a bit of benchmarking showed that the first option is about 20 times
> slower, to that extent that if I repeat it 10,000 times I save more than
> half a second. So speed is not really an issue here.
>
> Personally I'd go for option 2 as that one is easier to read and does the
> job nicely, but with these functions I'm always a bit afraid that I'm
> overseeing important details or side effects here (possibly memory issues
> when working with larger data).
>
> Anybody an idea what the dangers are of these methods, and which one is the
> most robust method?
>
> Thank you
> Joris
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list