[Rd] How do I reliably and efficiently hash a function?

Fri Dec 11 00:49:50 CET 2015

I’ve got the following scenario: I need to store information about an
R function, and retrieve it at a later point. In other programming
languages I’d implement this using a dictionary with the functions as
keys. In R, I’d usually use `attr(f, 'some-name')`. However, for my
purposes I do not want to use `attr` because the information that I
want to store is an implementation detail that should be hidden from
the user of the function (and, just as importantly, it shouldn’t
clutter the display when the function is printed on the console).

`comment` would be almost perfect since it’s hidden from the output
when printing a function — unfortunately, the information I’m storing
is not a character string (it’s in fact an environment), so I cannot
use `comment`.

How can this be achieved?

For reference, I’ve considered the following two alternatives:

1. Use `attr`, and override `print.function` to not print my
attribute. However, I’m wary of overriding a core function just to
implement such a little thing, and overriding this function would
obviously clash with other overrides, if somebody else happens to have
a similarly harebrain idea.

2. Use C++ to retrieve the SEXP to the body of the CLOSXP that
represents a function, and use that as a key in a dictionary. I
*think* that this robustly and efficiently identifies functions in R.
However, this relies quite heavily on R internal implementation
details, and in particular on the fact that the GC will not move
objects around in memory. The current GC doesn’t do this but Gábor
Csárdi rightfully pointed out to me that this might change.

On the chance that I’m trying to solve the wrong Y to an X/Y problem,
the full context to the above problem is explained in [1]. In a
nutshell, I am hooking a new environment into a function’s parent.env
chain, by re-assigning the function’s `parent.env` (naughty, I know):

```
parent.env(my_new_env) = parent.env(f)
parent.env(f) = my_new_env
```

This is done so that the function `f` finds objects defined inside
that environment without having to attach it globally. However, for
bookkeeping purposes I need to preserve the original parent
environment — hence the question.

[1]: https://github.com/klmr/modules/issues/66