[R] The stages of standard function evaluation

Thu May 3 14:06:15 CEST 2018

First of all, your message is a little hard to read because you posted 
in HTML.  This list removes the HTML, and often mangles messages, so you 
should always post in plain text.  But in this case your message was 
still pretty readable.

On 02/05/2018 11:04 PM, Andrew Hoerner wrote:
> Dear R Help folks --
> 
> I have been trying to put together a list of the steps or stages of R
> function evaluation, with particular focus on those that have "standard" or
> "nonstandard" forms. This is both for my own edification and also because I
> am thinking of joining the world of R bloggers and have been trying to put
> together some draft posting that might be useful. I seem to have an
> affirmative genius for finding incorrect interpretations of R's evaluation
> rules; I'm trying to make that an asset.
> 
> I am hoping that you can tell me:
> 
> 
>     1. Is this list complete, or are there additional stages I am missing?
>     2. Have I inserted one or more imaginary stages?
>     3. Are the terms I use below to name each stage appropriate, or are
>     there other terms more widely used or recognizable?
>     4. Is the order correct?
> 
> I begin each name with “Standard,” to express my belief that each of these
> things has a usual or default form, but also that (unless I am mistaken)
> almost none of them exist only in a single form true of all R functions. (I
> have marked with an asterisk a few evaluation steps that I think may always
> be followed).
> 
> It is my ultimate goal (which I do not feel at all close to accomplishing)
> to determine a way to mechanically test for “standardness” along each of
> these dimensions, so that each function could be assigned a logical vector
> showing the ways that it is and is not standard. One thing I think is
> conceptually or procedurally difficult about this project is that I think
> “standardness” should be determined by what a function does, rather than by
> how it does it, so that a primitive function that takes unevaluated
> arguments positionally could still have standard matching, scoping, etc.,
> by internal emulation. A related goal is to identify which evaluation steps
> most often use an alternative form, and perhaps determine if there is more
> than one such alternative. Finally, an easier short-term goal is simply to
> find instances of one or more function with standard and non-standard
> evaluation for each evaluation step.
> 
> For the most part below I am treating the evaluation of closures as the
> standard from which “nonstandard” is defined. However, I do not assume that
> other kinds of functions are automatically nonstandard on any particular
> dimension below. Most of this comes from the R Language Definition, but
> there are numerous places where I am by no means certain that my
> interpretation is correct. I have highlighted some of these below with a
> “??”.
> 
> I look forward to learning from you.
> 
> Warmest regards,
> 
> J. Andrew Hoerner
> 
> 
> ** Standard function recognition:* recognizing some or all of a string code
> as a function. (Part of code line parsing)
> 
> *Standard environment construction:* construction of the execution
> environment, and of pointers to the calling and enclosing environments.
> 
> *Standard function identification:* Get the name of the function, if any

This may be mangling, but it's really hard to tell whether the 3 
paragraphs above are supposed to be steps, headings, or what.  Assuming 
they are steps, the first one is wrong.

The parser looks at a string and breaks it down into tokens and 
subexpressions, making what you later call an AST.  The first step in 
function evaluation is recognizing that something is a function call, 
not recognizing it as a function.  For example, "mean" is the name of a 
function and also an expression evaluating to a function, "mean(1:10)" 
is a function call.

Once you have a function call, the next step looks at the expression 
used to specify the function.  In "mean(1:10)", that expression is 
"mean", but it could be an arbitrary R expression.  If it is a name like 
"mean" (or a string), then R looks for an object of mode "function" of 
that name in the current evaluation frame, or its parent frames.  These 
are not "constructed"; the current evaluation frame is always known, and 
contains a pointer to its parent.  If the function is specified by a 
more complex expression (e.g. in "fn[[1]](1:10)", the expression is 
"fn[[1]]") then that expression is evaluated.  It needs to return a 
function object or an error will be generated.

So these work:

mean(1:10)
list(mean)[[1]](1:10)
"mean"(1:10)

and these don't:

list("mean")(1:10)
c("mean")(1:10)

So now we have the function.  Its name is irrelevant.
> 
> ** Standard f**unction scoping*: Search the current environment and then up
> the chain of enclosing environments until you find the first binding
> environment, an environment where the name of the function is bound, i.e.
> linked to a pointer to the function definition. The binding environment is
> usually (but not always) the same as the defining environment (i.e. the
> enclosing environment when the function is defined. Note that function
> lookup, unlike function argument lookup, necessarily starts from the
> calling environment, because a function does not know what it is – its
> formals, body, and environments – until it is found. Named functions are
> always found by scoping. R never learns "where" they are -- they have to be
> looked up each time. For this reason, anonymous functions must be used in
> place, and called by a function that takes a function as an argument, or by
> (function(formals){body})(actual args)

> 
> *Standard f**unction **retrieval**:*
> 
> load (??) the function, i.e. transfer the list (??) of formals and defaults
> and the list (??) of expressions that constitute the function body into the
> execution environment

Functions have at least 3 parts, not 2.  They have formals, a body, and 
an environment.  Nowadays they will often have bytecode as well; this is 
a compiled version of the body used in its place during evaluation.

> 
> Note that the function body is parsed at the time the function is created
> (true?? Or is it parsed every time the function is called?)

It is only parsed once.

> 
> *Standard argument matching*:
> 
> assignment of expressions and default arguments to formals via the
> usually-stated matching rulesIf matched positionall at call time, the name
> is scoped like an actual argument.. Note that giving an argument the same
> name as a formal when calling the function will only match it to that
> formal if matched positionally or by tag, not by name.

I have no idea what you are saying in this paragraph.  Positional versus 
named matching has no effect on scoping.  Arguments specified in the 
call are scoped in the calling frame; default values for arguments are 
scoped in the evaluation frame.

> *Standard a**rgument parsing:*
> 
> Converts argument expressions into abstract syntax trees. “Standard”
> argument parsing and promise construction take place before the arguments
> are passed into the body.

You missed a step.  As evaluation starts, a new environment is created, 
the evaluation frame.  Its parent is the environment of the function; it 
is initialized with the formal arguments to the function as promises.

This is true for both standard and non-standard functions.  All 
arguments are parsed, standard or not, producing promises.  They are 
placed in the evaluation frame, not "passed into the body".

> 
> <Or do matching and parsing happen in reverse order?>

No, parse first, match second, put into evaluation frame third.

> 
> *Standard p**romise construction*:
> 
> Assigning each name (including function names) in an AST to its binding the
> calling environment (if ordinary) or the execution environment (if default)

No.  Each formal is bound to a promise in the evaluation frame. 
Promises contain an expression (an AST in your terms) and an 
environment.  As previously mentioned, the environment will be the 
calling frame for arguments passed in the call, the evaluation frame for 
arguments specified via defaults.

> (Am I right that the action here for calls and for names is essentially the
> same, and happens at the same time using the same lookup procedure?) Note
> that that scoping, in the form of search, applies only when the function is
> called. Formals are matched, but they are never scoped, except that their
> default values are assigned into the function body when the function is
> called and then scoped from there if they are not assigned to a value
> before they are used. Actual arguments on function call that are not found
> in the calling environment are scoped ?? up the call tree ?? until they
> reach the top level, and then up the search path.

No.  Arguments are all treated as promises, i.e. un-evaluated 
expressions with an attached environment.  No search is done until later 
when they are evaluated.

> 
> “R Language Definition 4.3.3 Argument evaluation:   One of the most
> important things to know about the evaluation of arguments to a function is
> that supplied arguments and default arguments are treated differently. The
> supplied arguments to a function are evaluated in the evaluation frame of
> the calling function. The default arguments to a function are evaluated in
> the evaluation frame of the function.”
> 
> (Note 1: In some places closures are described as capturing their defining
> environment, as if they made and stored a copy. For instance, from the R
> Language Definition 2.1.5: “Any symbols bound in that environment are
> *captured* and available to the function. This combination of the code of
> the function and the bindings in its environment is called a ‘function
> closure’, a term from functional programming theory. In this document we
> generally use the term ‘function’, but use ‘closure’ to emphasize the
> importance of the attached environment.”
> 
> But I think what gets passed in promises are pointers to objects in the
> environment, not the environment in its entirety., nor even the objects the
> pointers point to. These are sought only when the promise is kept, and
> actually copied into the execution environment only if they are
> subsequently altered. If they are only used as function arguments and not
> altered themselves they are, I believe, used in place, without copying.)

No, promises contain expressions, and references (pointers) to environments.
> 
> (Note 2. Assigning an argument to a formal via = creates a default argument
> only in a function definition. When such an assignment is made during a
> function call, the RHS scopes to the calling environment and up, not to the
> function body.)

That sounds correct.
> 
> *Standard body construction.*
> 
> Replace each occurrence of a formal within the function with the value of
> that formal if a constant, and otherwise with the AST identified with that
> formal and any promises it contains (sometimes collectively called the
> actual arguments, as distinct from the formal arguments).

No.  The body is just an expression.  Typically it's a compound 
statement enclosed in braces, but not necessarily.  No substitutions are 
done.  Later when it is evaluated, symbols in that expression will be 
looked up in the evaluation frame.

> 
> *Standard body execution:*
> 
> It is strange to me that everything I know about function evaluation,
> standard or non-standard, seems to be about getting arguments into or out
> of the body of the function. If there is anything strictly internal to body
> execution that can be called standard, I don't know what it is. Here are a
> couple of candidates, but this list seems very incomplete to me:
> 
> ** Standard expression sequencing.* Expressions in the body are executed
> sequentially except as that sequence is altered by flow control functions
> (if/else, while, switch, etc.) and block grouping functions ({}).
> 
> *Standard p**romise triggering:* Recognition that an action on the promise
> that constitutes "use" has taken place and that the promise now needs to be
> fulfilled. (I have never seen a clear statement of exactly what uses do and
> do not trigger fulfillment of a promise).
> 
> *Standard promise fulfillment/Argument scoping*: For names passed via
> formals, scope up the environment hierarchy from the calling environment
> and then up the call chain ??. For default arguments, scope starting in the
> function’s execution environment (the parent of which is the defining
> environment ??), to the first environment where the name exists, and fetch
> the value. For functions, recurse.

This is unnecessarily complex.  Evaluation of the body expression is 
just like evaluation of any other expression.  What is special is that 
the evaluation frame is set as the current frame, and some of the 
objects in it are promises, which have their own special rules.

> 
> Inside of the body, formals are always referred to by the name of the
> formal, not the name of the variables assigned to the formal. When the
> function is called, instances of the formal will be replaced by the code
> for their actual arguments, but should still be referred to by the name of
> the formal -- attempts to refer to arguments by their actual names after
> substitution will not be recognized.

Again, unnecessary.

> 
> Note: 1) for user-defined functions created in the global environment, the
> defining and calling environment will often be the same; 2) a promise is
> fulfilled once – subsequent use of the same variable will return the same
> value, even if it has changed in the lookup scope (as just defined) in the
> interim. This also prevents arguments assigned to complex expressions from
> being recalculated if they are used multiple times in the same call; 3)
> although the immediately enclosing and defining environments are set when
> the function is defined, the chain of environments that enclose that
> environment is not determined until promise fulfillment.

I would recommend separating observations like 1) from rules like 2). 
The rules are pretty simple.  The consequences of them can be more complex.

3) is just wrong.  Promises have environments where their expressions 
are evaluated.

> 
> *Standard informal scoping*:
> 
> Names and functions in the execution environment not defined in formals are
> scoped like default arguments.

Again, this is unnecessary.  The body is just an expression that is 
evaluated in the evaluation frame.

> 
> There are probably more here that I am missing. Of course, there is the
> distinction between closures, primitives (builtin and special (are these
> catagories exhaustive?)) and internals (builtin and special). So maybe
> closures do standard body evaluation? And I am not sure where functions
> that use use the .C, .Fortran. .Call or .External interfaces fit in, except
> that I think typeof returns closure for all of them.
> 
> *Standard return: *
> 
> Return the value of the last expression to the calling environment, close
> and cleanup. I would also call return on return() or stop() standard.

The basic difference between standard evaluation and nonstandard 
evaluation is whether the function looks at the expression in promises, 
or only looks at the value when it is evaluated.  substitute()  is the 
usual way to look at the expression, but packages like rlang define others.

Other issues that you haven't touched on that probably belong in a 
writeup like this are a description of how ... is handled, the rarely 
used ..1, ..2, etc., and the super-assignment operator <<-.

Duncan Murdoch