[Rd] WISH: Optional mechanism preventing var <<- value from assigning non-existing variable
Henrik Bengtsson
henr|k@bengt@@on @end|ng |rom gm@||@com
Sun Mar 19 17:08:20 CET 2023
I'd like to be able to prevent the <<- assignment operator ("super
assignment") from assigning to the global environment unless the
variable already exists and is not locked. If it does not exist or is
locked, I'd like an error to be produced. This would allow me to
evaluate expressions with this temporarily set to protect against
mistakes.
For example, I'd like to do something like:
$ R --vanilla
> exists("a")
[1] FALSE
> options(check.superassignment = TRUE)
> local({ a <<- 1 })
Error: object 'a' not found
> a <- 0
> local({ a <<- 1 })
> a
[1] 1
> rm("a")
> options(check.superassignment = FALSE)
> local({ a <<- 1 })
> exists("a")
[1] TRUE
BACKGROUND:
>From help("<<-") we have:
"The operators <<- and ->> are normally only used in functions, and
cause a search to be made through parent environments for an existing
definition of the variable being assigned. If such a variable is found
(and its binding is not locked) then its value is redefined, otherwise
assignment takes place in the global environment."
I argue that it's unfortunate that <<- fallbacks back to assigning to
the global environment if the variable does not already exist.
Unfortunately, it has become a "go to" solution for many to use it
that way. Sometimes it is intended, sometimes it's a mistake. We
find it also in R packages on CRAN, even if 'R CMD check' tries to
detect when it happens (but it's limited to do so from run-time
examples and tests).
It's probably too widely used for us to change to a more strict
behavior permanent. The proposed R option allows me, as a developer,
to evaluate an R expression with the strict behavior, especially if I
don't trust the code.
With 'check.superassignment = TRUE' set, a developer would have to
first declare the variable in the global environment for <<- to assign
there. This would remove the fallback "If such a variable is found
(and its binding is not locked) then its value is redefined, otherwise
assignment takes place in the global environment" in the current
design. For those who truly intends to assign to the global, could
use assign(var, value, envir = globalenv()) or globalenv()[[var]] <-
value.
'R CMD check' could temporarily set 'check.superassignment = TRUE'
during checks. If we let environment variable
'R_CHECK_SUPERASSIGNMENT' set the default value of option
'check.superassignment' on R startup, it would be possible to check
packages optionally this way, but also to run any "non-trusted" R
script in the "strict" mode.
TEASER:
Here's an example why using <<- for assigning to the global
environment is a bad idea:
This works:
$ R --vanilla
> y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
> keep
> [1] 3
This doesn't work:
$ R --vanilla
> library(purrr)
> y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
Error in keep <<- x : cannot change value of locked binding for 'keep'
But, if we "declare" the variable first, it works:
$ R --vanilla
> library(purrr)
> keep <- 0
> y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
> keep
> [1] 3
/Henrik
PS. Does the <<- operator have an official name? Hadley calls it
"super assignment" in 'Advanced R'
(https://adv-r.hadley.nz/environments.html), which is where I got it
from.
More information about the R-devel
mailing list