[Rd] works in R-1.1.1 but not in R-development; why?

Luke Tierney luke@goose.stat.umn.edu
Thu, 12 Oct 2000 14:03:57 -0500 (CDT)


Peter Dalgaard BSA wrote:
> Ramon Diaz-Uriarte <ramon-diaz@teleline.es> writes:
> 
> > Dear All,
> > 
> > A library (PHYLOGR) that passed the usual tests in R-1.1.1 gives errors with
> > R-devel; my (mis?)understanding of scoping rules is
> > that it should have worked in both. The problems seem related to using the
> > name of the data frame for extracting weights or subsets within a function
> > call. The problems can be reproduced as follows:
> > 
> > **********************
> > 
> > datai <- data.frame( y = rnorm(10), x1 = rnorm(10), x2 = abs(rnorm(10)),
> >                     x3 = rep(seq(1,5),2), counter = rep(c(1,2),c(5,5)))
> > 
> > formula <- as.formula(y ~ x1)
> > 
> > 
> > # the following fails in R-1.2.0 but not in R-1.1.1
> > # > Error in eval(expr, envir, enclos) : Object "datos" not found
> > lapply(split(datai,datai$counter),
> >        function(datos,formula) {lm(formula = formula, data = datos,
> >                                    weights = datos$x2)},
> >        formula = formula) 
> 
> Ow!... This happens because of a change that makes formulas capture
> their environment of definition.
> 
> A workaround is to explicitly set the environment of the formula to
> the current environment, like this:
> 
> lapply(split(datai,datai$counter),
>        function(datos,formula) {
> 	  environment(formula)<-environment()
> 	  lm(formula = formula, data = datos,
>                                    weights = datos$x2)
>        },
>        formula = formula) 
> 
> but I bet Luke wants to comment on this...
> 

Not really, but I guess I have no choice :-).  Here is my take on this:

The simple solution is to use

lapply(split(datai,datai$counter),
       function(datos,formula) {lm(formula = formula, data = datos,
                                   weights = x2)},
       formula = formula) 

i.e. use x2 instead of datos$x2 as the weights argument.  This works
in both 1.1.1 and in the devel branch.

A long-winded explanation:

What makes using lm and friends in functions difficult is that some of
its arguments are used for value and some are used for expression (for
lack of better terms).  Arguments used for value are ordinary function
arguments that are evaluated internally by the standard function
evaluation process; the data argument is one.  (For value arguments
you can almost think of them as being computed before the function
call and only their values are passed.)  Expression arguments are not
evaluated directly.  Instead their expressions are captured (by
substitute or something similar), those expressions are then examined,
possibly modified, and then possibly evaluated through an explicit
call to eval using some context. The weights argument is an expression
argument.

The key in understanding how expression arguments work is knowing and
perhaps controlling the environment used for evaluating them (*and*
knowing which they are--the documentation isn't as helpful as it could
be here).  The changes Robert made to lm and related functions are a
first step in trying to make the context in which expression arguments
are used a bit more rational and controllable, in particular when no
explicit data frames are supplied.

The 1.1.1 rules were to evaluate expression arguments in an
environment consisting of the data frame and the environment of the
caller of lm.  The (intended at least) new rules, which may still
change, are that the evaluation environment consist of the data frame
and the environment in which the formula was constructed.

The two approaches both use the data frame, if supplied, as the first
place to find variable; they only differ in how they handle the case
where the data frame does not contain the values.

The call used in this example is:

	lm(formula = formula, data = datos, weights = datos$x2)

An explicit data argument is given but the data frame datos only
contains the components y, x1, x2, x3, and counter.  The expression
provided as the weights argument is datos$x2.  When the expression is
evaluated, datos is not found in the data frame provided, so the
default environment is used.  Under 1.1.1 that is the caller's
environment and you get what you want. In the devel branch it is the
environment in which the formula was created, which is the global
environment.

With all that, back to the simple solution: The weights argument x2 is
an expression argument and is intended to refer to the x2 component of
the data frame.  When the expression is evaluated, the first place the
evaluation looks for variable values, both in 1.1.1 and in devel, is
the data frame.  So you get the answer you want in both cases.

luke

-- 
Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke@stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._