[Rd] Most efficient way to check the length of a variable mentioned in a formula.
Joris Meys
jorismeys at gmail.com
Fri Oct 17 22:50:53 CEST 2014
Thank you both, great ideas. William, I see the point of using eval, but
the problem is that I can't evaluate the formula itself yet. I need to know
the length of these variables to create a function that is used to
evaluate. So if I try to evaluate the formula in some way before I created
the function, it will just return an error.
Now I use the attribute variables of the formula terms to get the variables
that -after some more manipulation- eventually will be the model matrix.
Something like this :
afun <- function(formula, ...){
varnames <- all.vars(formula)
fenv <- environment(formula)
txt <- paste('length(',varnames[1],')')
n <- eval(parse(text=txt), envir=fenv)
fun <- function(x) x/n
myterms <- terms(formula)
eval(attr(myterms, 'variables'))
}
And that should give:
> x <- 1:10
> y <- 10:1
> z <- 11:20
> afun(z ~ fun(x) + y)
[[1]]
[1] 11 12 13 14 15 16 17 18 19 20
[[2]]
[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
[[3]]
[1] 10 9 8 7 6 5 4 3 2 1
It might be I'm walking to Paris over Singapore, but I couldn't find a
better way to do it.
Cheers
Joris
On Fri, Oct 17, 2014 at 10:16 PM, William Dunlap <wdunlap at tibco.com> wrote:
> I got the default value for getRHSLength's data argument wrong - it
> should be NULL, not parent.env().
> getRHSLength <- function (formula, data = NULL)
> {
> rhsExpr <- formula[[length(formula)]]
> rhsValue <- eval(rhsExpr, envir = data, enclos =
> environment(formula))
> length(rhsValue)
> }
> so that the function firstHalf is found in the following
> > X <- 1:10
> >
> getRHSLength((function(){firstHalf<-function(x)x[seq_len(floor(length(x)/2))];
> ~firstHalf(X)})())
> [1] 5
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Oct 17, 2014 at 11:57 AM, William Dunlap <wdunlap at tibco.com>
> wrote:
> > I would use eval(), but I think that most formula-using functions do
> > it more like the following.
> >
> > getRHSLength <-
> > function (formula, data = parent.frame())
> > {
> > rhsExpr <- formula[[length(formula)]]
> > rhsValue <- eval(rhsExpr, envir = data, enclos =
> environment(formula))
> > length(rhsValue)
> > }
> >
> > * use eval() instead of get() so you will find variables are in
> > ancestral environments
> > of envir (if envir is an environment), not just envir itself.
> > * just evaluate the stuff in the formula using the non-standard
> > evaluation frame,
> > call length() in the current frame. Otherwise, if envir inherits
> > directly from emptyenv() the 'length' function will not be found.
> > * use envir=data so it looks first in the data argument for variables
> > * the enclos argument is used if envir is not an environment and is used
> to
> > find variables that are not in envir.
> >
> > Here are some examples:
> > > X <- 1:10
> > > getRHSLength(~X)
> > [1] 10
> > > getRHSLength(~X, data=data.frame(X=1:2))
> > [1] 2
> > > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame())
> > [1] 4
> > > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2))
> > [1] 2
> > > getRHSLength((function(){X <- 1:4; ~X})(),
> data=list2env(data.frame()))
> > [1] 10
> > > getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv())
> > Error in eval(expr, envir, enclos) : object 'X' not found
> >
> > I think you will see the same lookups if you try analogous things with
> lm().
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> >
> > On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com>
> wrote:
> >> Dear R gurus,
> >>
> >> I need to know the length of a variable (let's call that X) that is
> >> mentioned in a formula. So obviously I look for the environment from
> which
> >> the formula is called and then I have two options:
> >>
> >> - using eval(parse(text='length(X)'),
> >> envir=environment(formula) )
> >>
> >> - using length(get('X'),
> >> envir=environment(formula) )
> >>
> >> a bit of benchmarking showed that the first option is about 20 times
> >> slower, to that extent that if I repeat it 10,000 times I save more than
> >> half a second. So speed is not really an issue here.
> >>
> >> Personally I'd go for option 2 as that one is easier to read and does
> the
> >> job nicely, but with these functions I'm always a bit afraid that I'm
> >> overseeing important details or side effects here (possibly memory
> issues
> >> when working with larger data).
> >>
> >> Anybody an idea what the dangers are of these methods, and which one is
> the
> >> most robust method?
> >>
> >> Thank you
> >> Joris
> >>
> >> --
> >> Joris Meys
> >> Statistical consultant
> >>
> >> Ghent University
> >> Faculty of Bioscience Engineering
> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
> >>
> >> tel : +32 9 264 59 87
> >> Joris.Meys at Ugent.be
> >> -------------------------------
> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics
tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
More information about the R-devel
mailing list