[Rd] Most efficient way to check the length of a variable mentioned in a formula.

William Dunlap wdunlap at tibco.com
Fri Oct 17 23:36:08 CEST 2014


In my example function I did not evaluate the formula either, just a part of it.

If you leave off the envir and enclos arguments to eval in your
function you can get surprising (wrong) results.  E.g.,
  > afun(y ~ varnames)
  [[1]]
   [1] 10  9  8  7  6  5  4  3  2  1

  [[2]]
  [1] "y"        "varnames"

If you want to use the variables in data or environment(formula) and
some functions defined in your function, then you could make a child
environment of environment(formula), put your locally defined
functions in it, and use the child environment in the call to eval.
E.g., you code would become
afun2 <- function(formula, ...){

    varnames <- all.vars(formula)
    fenv <- environment(formula)

    n <- length(eval(as.name(varnames[1]), envir=fenv))
    childEnv <- new.env(parent=fenv)
    childEnv$fun <- function(x) x/n

    myterms <- terms(formula)
    eval(attr(myterms, 'variables'), envir=childEnv)
}

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Oct 17, 2014 at 1:50 PM, Joris Meys <jorismeys at gmail.com> wrote:
> Thank you both, great ideas.  William, I see the point of using eval, but
> the problem is that I can't evaluate the formula itself yet. I need to know
> the length of these variables to create a function that is used to evaluate.
> So if I try to evaluate the formula in some way before I created the
> function, it will just return an error.
>
> Now I use the attribute variables of the formula terms to get the variables
> that -after some more manipulation- eventually will be the model matrix.
> Something like this :
>
> afun <- function(formula, ...){
>
>     varnames <- all.vars(formula)
>     fenv <- environment(formula)
>
>     txt <- paste('length(',varnames[1],')')
>     n <- eval(parse(text=txt), envir=fenv)
>
>     fun <- function(x) x/n
>
>     myterms <- terms(formula)
>     eval(attr(myterms, 'variables'))
>
> }
>
> And that should give:
>
>> x <- 1:10
>> y <- 10:1
>> z <- 11:20
>> afun(z ~ fun(x) + y)
> [[1]]
>  [1] 11 12 13 14 15 16 17 18 19 20
>
> [[2]]
>  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
>
> [[3]]
>  [1] 10  9  8  7  6  5  4  3  2  1
>
> It might be I'm walking to Paris over Singapore, but I couldn't find a
> better way to do it.
>
> Cheers
> Joris
>
> On Fri, Oct 17, 2014 at 10:16 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>
>> I got the default value for getRHSLength's data argument wrong - it
>> should be NULL, not parent.env().
>>    getRHSLength <- function (formula, data = NULL)
>>    {
>>        rhsExpr <- formula[[length(formula)]]
>>        rhsValue <- eval(rhsExpr, envir = data, enclos =
>> environment(formula))
>>        length(rhsValue)
>>    }
>> so that the function firstHalf is found in the following
>>    > X <- 1:10
>>    >
>> getRHSLength((function(){firstHalf<-function(x)x[seq_len(floor(length(x)/2))];
>> ~firstHalf(X)})())
>>    [1] 5
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Fri, Oct 17, 2014 at 11:57 AM, William Dunlap <wdunlap at tibco.com>
>> wrote:
>> > I would use eval(), but I think that most formula-using functions do
>> > it more like the following.
>> >
>> > getRHSLength <-
>> > function (formula, data = parent.frame())
>> > {
>> >     rhsExpr <- formula[[length(formula)]]
>> >     rhsValue <- eval(rhsExpr, envir = data, enclos =
>> > environment(formula))
>> >     length(rhsValue)
>> > }
>> >
>> > * use eval() instead of get() so you will find variables are in
>> > ancestral environments
>> > of envir (if envir is an environment), not just envir itself.
>> > * just evaluate the stuff in the formula using the non-standard
>> > evaluation frame,
>> > call length() in the current frame.  Otherwise, if  envir inherits
>> > directly from emptyenv() the 'length' function will not be found.
>> > * use envir=data so it looks first in the data argument for variables
>> > * the enclos argument is used if envir is not an environment and is used
>> > to
>> > find variables that are not in envir.
>> >
>> > Here are some examples:
>> >   > X <- 1:10
>> >   > getRHSLength(~X)
>> >   [1] 10
>> >   > getRHSLength(~X, data=data.frame(X=1:2))
>> >   [1] 2
>> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame())
>> >   [1] 4
>> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2))
>> >   [1] 2
>> >   > getRHSLength((function(){X <- 1:4; ~X})(),
>> > data=list2env(data.frame()))
>> >   [1] 10
>> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv())
>> >   Error in eval(expr, envir, enclos) : object 'X' not found
>> >
>> > I think you will see the same lookups if you try analogous things with
>> > lm().
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com
>> >
>> >
>> > On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com>
>> > wrote:
>> >> Dear R gurus,
>> >>
>> >> I need to know the length of a variable (let's call that X) that is
>> >> mentioned in a formula. So obviously I look for the environment from
>> >> which
>> >> the formula is called and then I have two options:
>> >>
>> >> - using eval(parse(text='length(X)'),
>> >>                     envir=environment(formula) )
>> >>
>> >> - using length(get('X'),
>> >>             envir=environment(formula) )
>> >>
>> >> a bit of benchmarking showed that the first option is about 20 times
>> >> slower, to that extent that if I repeat it 10,000 times I save more
>> >> than
>> >> half a second. So speed is not really an issue here.
>> >>
>> >> Personally I'd go for option 2 as that one is easier to read and does
>> >> the
>> >> job nicely, but with these functions I'm always a bit afraid that I'm
>> >> overseeing important details or side effects here (possibly memory
>> >> issues
>> >> when working with larger data).
>> >>
>> >> Anybody an idea what the dangers are of these methods, and which one is
>> >> the
>> >> most robust method?
>> >>
>> >> Thank you
>> >> Joris
>> >>
>> >> --
>> >> Joris Meys
>> >> Statistical consultant
>> >>
>> >> Ghent University
>> >> Faculty of Bioscience Engineering
>> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
>> >>
>> >> tel : +32 9 264 59 87
>> >> Joris.Meys at Ugent.be
>> >> -------------------------------
>> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>> >>
>> >>         [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-devel at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



More information about the R-devel mailing list