[R] The stages of standard function evaluation

Thu May 3 05:04:34 CEST 2018

Dear R Help folks --

I have been trying to put together a list of the steps or stages of R
function evaluation, with particular focus on those that have "standard" or
"nonstandard" forms. This is both for my own edification and also because I
am thinking of joining the world of R bloggers and have been trying to put
together some draft posting that might be useful. I seem to have an
affirmative genius for finding incorrect interpretations of R's evaluation
rules; I'm trying to make that an asset.

I am hoping that you can tell me:

   1. Is this list complete, or are there additional stages I am missing?
   2. Have I inserted one or more imaginary stages?
   3. Are the terms I use below to name each stage appropriate, or are
   there other terms more widely used or recognizable?
   4. Is the order correct?

I begin each name with “Standard,” to express my belief that each of these
things has a usual or default form, but also that (unless I am mistaken)
almost none of them exist only in a single form true of all R functions. (I
have marked with an asterisk a few evaluation steps that I think may always
be followed).

It is my ultimate goal (which I do not feel at all close to accomplishing)
to determine a way to mechanically test for “standardness” along each of
these dimensions, so that each function could be assigned a logical vector
showing the ways that it is and is not standard. One thing I think is
conceptually or procedurally difficult about this project is that I think
“standardness” should be determined by what a function does, rather than by
how it does it, so that a primitive function that takes unevaluated
arguments positionally could still have standard matching, scoping, etc.,
by internal emulation. A related goal is to identify which evaluation steps
most often use an alternative form, and perhaps determine if there is more
than one such alternative. Finally, an easier short-term goal is simply to
find instances of one or more function with standard and non-standard
evaluation for each evaluation step.

For the most part below I am treating the evaluation of closures as the
standard from which “nonstandard” is defined. However, I do not assume that
other kinds of functions are automatically nonstandard on any particular
dimension below. Most of this comes from the R Language Definition, but
there are numerous places where I am by no means certain that my
interpretation is correct. I have highlighted some of these below with a
“??”.

I look forward to learning from you.

Warmest regards,

J. Andrew Hoerner

** Standard function recognition:* recognizing some or all of a string code
as a function. (Part of code line parsing)

*Standard environment construction:* construction of the execution
environment, and of pointers to the calling and enclosing environments.

*Standard function identification:* Get the name of the function, if any

** Standard f**unction scoping*: Search the current environment and then up
the chain of enclosing environments until you find the first binding
environment, an environment where the name of the function is bound, i.e.
linked to a pointer to the function definition. The binding environment is
usually (but not always) the same as the defining environment (i.e. the
enclosing environment when the function is defined. Note that function
lookup, unlike function argument lookup, necessarily starts from the
calling environment, because a function does not know what it is – its
formals, body, and environments – until it is found. Named functions are
always found by scoping. R never learns "where" they are -- they have to be
looked up each time. For this reason, anonymous functions must be used in
place, and called by a function that takes a function as an argument, or by
(function(formals){body})(actual args)

*Standard f**unction **retrieval**:*

load (??) the function, i.e. transfer the list (??) of formals and defaults
and the list (??) of expressions that constitute the function body into the
execution environment

Note that the function body is parsed at the time the function is created
(true?? Or is it parsed every time the function is called?)

*Standard argument matching*:

assignment of expressions and default arguments to formals via the
usually-stated matching rulesIf matched positionall at call time, the name
is scoped like an actual argument.. Note that giving an argument the same
name as a formal when calling the function will only match it to that
formal if matched positionally or by tag, not by name.

*Standard a**rgument parsing:*

Converts argument expressions into abstract syntax trees. “Standard”
argument parsing and promise construction take place before the arguments
are passed into the body.

<Or do matching and parsing happen in reverse order?>

*Standard p**romise construction*:

Assigning each name (including function names) in an AST to its binding the
calling environment (if ordinary) or the execution environment (if default)
(Am I right that the action here for calls and for names is essentially the
same, and happens at the same time using the same lookup procedure?) Note
that that scoping, in the form of search, applies only when the function is
called. Formals are matched, but they are never scoped, except that their
default values are assigned into the function body when the function is
called and then scoped from there if they are not assigned to a value
before they are used. Actual arguments on function call that are not found
in the calling environment are scoped ?? up the call tree ?? until they
reach the top level, and then up the search path.

“R Language Definition 4.3.3 Argument evaluation:   One of the most
important things to know about the evaluation of arguments to a function is
that supplied arguments and default arguments are treated differently. The
supplied arguments to a function are evaluated in the evaluation frame of
the calling function. The default arguments to a function are evaluated in
the evaluation frame of the function.”

(Note 1: In some places closures are described as capturing their defining
environment, as if they made and stored a copy. For instance, from the R
Language Definition 2.1.5: “Any symbols bound in that environment are
*captured* and available to the function. This combination of the code of
the function and the bindings in its environment is called a ‘function
closure’, a term from functional programming theory. In this document we
generally use the term ‘function’, but use ‘closure’ to emphasize the
importance of the attached environment.”

But I think what gets passed in promises are pointers to objects in the
environment, not the environment in its entirety., nor even the objects the
pointers point to. These are sought only when the promise is kept, and
actually copied into the execution environment only if they are
subsequently altered. If they are only used as function arguments and not
altered themselves they are, I believe, used in place, without copying.)

(Note 2. Assigning an argument to a formal via = creates a default argument
only in a function definition. When such an assignment is made during a
function call, the RHS scopes to the calling environment and up, not to the
function body.)

*Standard body construction.*

Replace each occurrence of a formal within the function with the value of
that formal if a constant, and otherwise with the AST identified with that
formal and any promises it contains (sometimes collectively called the
actual arguments, as distinct from the formal arguments).

*Standard body execution:*

It is strange to me that everything I know about function evaluation,
standard or non-standard, seems to be about getting arguments into or out
of the body of the function. If there is anything strictly internal to body
execution that can be called standard, I don't know what it is. Here are a
couple of candidates, but this list seems very incomplete to me:

** Standard expression sequencing.* Expressions in the body are executed
sequentially except as that sequence is altered by flow control functions
(if/else, while, switch, etc.) and block grouping functions ({}).

*Standard p**romise triggering:* Recognition that an action on the promise
that constitutes "use" has taken place and that the promise now needs to be
fulfilled. (I have never seen a clear statement of exactly what uses do and
do not trigger fulfillment of a promise).

*Standard promise fulfillment/Argument scoping*: For names passed via
formals, scope up the environment hierarchy from the calling environment
and then up the call chain ??. For default arguments, scope starting in the
function’s execution environment (the parent of which is the defining
environment ??), to the first environment where the name exists, and fetch
the value. For functions, recurse.

Inside of the body, formals are always referred to by the name of the
formal, not the name of the variables assigned to the formal. When the
function is called, instances of the formal will be replaced by the code
for their actual arguments, but should still be referred to by the name of
the formal -- attempts to refer to arguments by their actual names after
substitution will not be recognized.

Note: 1) for user-defined functions created in the global environment, the
defining and calling environment will often be the same; 2) a promise is
fulfilled once – subsequent use of the same variable will return the same
value, even if it has changed in the lookup scope (as just defined) in the
interim. This also prevents arguments assigned to complex expressions from
being recalculated if they are used multiple times in the same call; 3)
although the immediately enclosing and defining environments are set when
the function is defined, the chain of environments that enclose that
environment is not determined until promise fulfillment.

*Standard informal scoping*:

Names and functions in the execution environment not defined in formals are
scoped like default arguments.

There are probably more here that I am missing. Of course, there is the
distinction between closures, primitives (builtin and special (are these
catagories exhaustive?)) and internals (builtin and special). So maybe
closures do standard body evaluation? And I am not sure where functions
that use use the .C, .Fortran. .Call or .External interfaces fit in, except
that I think typeof returns closure for all of them.

*Standard return: *

Return the value of the last expression to the calling environment, close
and cleanup. I would also call return on return() or stop() standard.

-- 
J. Andrew Hoerner
Director, Sustainable Economics Program
Redefining Progress
(510) 507-4820

	[[alternative HTML version deleted]]