[Rd] Most efficient way to check the length of a variable mentioned in a formula.

Joris Meys jorismeys at gmail.com
Fri Oct 17 22:50:53 CEST 2014


Thank you both, great ideas.  William, I see the point of using eval, but
the problem is that I can't evaluate the formula itself yet. I need to know
the length of these variables to create a function that is used to
evaluate. So if I try to evaluate the formula in some way before I created
the function, it will just return an error.

Now I use the attribute variables of the formula terms to get the variables
that -after some more manipulation- eventually will be the model matrix.
Something like this :

afun <- function(formula, ...){

    varnames <- all.vars(formula)
    fenv <- environment(formula)

    txt <- paste('length(',varnames[1],')')
    n <- eval(parse(text=txt), envir=fenv)

    fun <- function(x) x/n

    myterms <- terms(formula)
    eval(attr(myterms, 'variables'))

}

And that should give:

> x <- 1:10
> y <- 10:1
> z <- 11:20
> afun(z ~ fun(x) + y)
[[1]]
 [1] 11 12 13 14 15 16 17 18 19 20

[[2]]
 [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

[[3]]
 [1] 10  9  8  7  6  5  4  3  2  1

It might be I'm walking to Paris over Singapore, but I couldn't find a
better way to do it.

Cheers
Joris

On Fri, Oct 17, 2014 at 10:16 PM, William Dunlap <wdunlap at tibco.com> wrote:

> I got the default value for getRHSLength's data argument wrong - it
> should be NULL, not parent.env().
>    getRHSLength <- function (formula, data = NULL)
>    {
>        rhsExpr <- formula[[length(formula)]]
>        rhsValue <- eval(rhsExpr, envir = data, enclos =
> environment(formula))
>        length(rhsValue)
>    }
> so that the function firstHalf is found in the following
>    > X <- 1:10
>    >
> getRHSLength((function(){firstHalf<-function(x)x[seq_len(floor(length(x)/2))];
> ~firstHalf(X)})())
>    [1] 5
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Oct 17, 2014 at 11:57 AM, William Dunlap <wdunlap at tibco.com>
> wrote:
> > I would use eval(), but I think that most formula-using functions do
> > it more like the following.
> >
> > getRHSLength <-
> > function (formula, data = parent.frame())
> > {
> >     rhsExpr <- formula[[length(formula)]]
> >     rhsValue <- eval(rhsExpr, envir = data, enclos =
> environment(formula))
> >     length(rhsValue)
> > }
> >
> > * use eval() instead of get() so you will find variables are in
> > ancestral environments
> > of envir (if envir is an environment), not just envir itself.
> > * just evaluate the stuff in the formula using the non-standard
> > evaluation frame,
> > call length() in the current frame.  Otherwise, if  envir inherits
> > directly from emptyenv() the 'length' function will not be found.
> > * use envir=data so it looks first in the data argument for variables
> > * the enclos argument is used if envir is not an environment and is used
> to
> > find variables that are not in envir.
> >
> > Here are some examples:
> >   > X <- 1:10
> >   > getRHSLength(~X)
> >   [1] 10
> >   > getRHSLength(~X, data=data.frame(X=1:2))
> >   [1] 2
> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame())
> >   [1] 4
> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2))
> >   [1] 2
> >   > getRHSLength((function(){X <- 1:4; ~X})(),
> data=list2env(data.frame()))
> >   [1] 10
> >   > getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv())
> >   Error in eval(expr, envir, enclos) : object 'X' not found
> >
> > I think you will see the same lookups if you try analogous things with
> lm().
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> >
> > On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com>
> wrote:
> >> Dear R gurus,
> >>
> >> I need to know the length of a variable (let's call that X) that is
> >> mentioned in a formula. So obviously I look for the environment from
> which
> >> the formula is called and then I have two options:
> >>
> >> - using eval(parse(text='length(X)'),
> >>                     envir=environment(formula) )
> >>
> >> - using length(get('X'),
> >>             envir=environment(formula) )
> >>
> >> a bit of benchmarking showed that the first option is about 20 times
> >> slower, to that extent that if I repeat it 10,000 times I save more than
> >> half a second. So speed is not really an issue here.
> >>
> >> Personally I'd go for option 2 as that one is easier to read and does
> the
> >> job nicely, but with these functions I'm always a bit afraid that I'm
> >> overseeing important details or side effects here (possibly memory
> issues
> >> when working with larger data).
> >>
> >> Anybody an idea what the dangers are of these methods, and which one is
> the
> >> most robust method?
> >>
> >> Thank you
> >> Joris
> >>
> >> --
> >> Joris Meys
> >> Statistical consultant
> >>
> >> Ghent University
> >> Faculty of Bioscience Engineering
> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
> >>
> >> tel : +32 9 264 59 87
> >> Joris.Meys at Ugent.be
> >> -------------------------------
> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list