[Rd] Hashed environments of size <5 never grow

Duncan Garmonsway Dunc@n@G@rmon@w@y @end|ng |rom gm@||@com
Mon Apr 11 23:24:52 CEST 2022


Hello,

Hashed environments that begin with a (non-default) size of 4 or less, will
never grow, which is very detrimental to performance.  For example,

```
n <- 10000
l <- vector("list", n)
l <- setNames(l, seq_len(n))

# Takes a second, and nchains remains 1.
e1 <- list2env(l, hash = TRUE, size = 1)
env.profile(e1)$nchains
# [1] 1

# Returns instantly, and nchains grows to 6950
e2 <- list2env(l, hash = TRUE, size = 5)
env.profile(e2)$nchains
# [1] 6950
```

The cause is that, when calling the growth function, the new size is
truncated to an integer.  See src/main/envir.c line 440, or
https://github.com/wch/r-source/blob/d9b9d00b6d2764839f229bf011dda8d027aae227/src/main/envir.c#L440

Given the hard-coded growth rate of 1.2, any size of 4 or less will be
truncated back to itself.

(int) (1 * 1.2 ) = 1
(int) (2 * 1.2) = 1
(int) (3 * 1.2) = 1
(int) (4 * 1.2) = 1
(int) (5 * 1.2) = 6

This is a rare case, and I couldn't find any examples in CRAN packages of
the `size` argument being used at all, let alone so small.  Even so, it
tripped me up, and could be fixed by using `ceil()` in src/main/envir.c
line 440 as follows.

new_table = R_NewHashTable((int)(ceil(HASHSIZE(table) *
HASHTABLEGROWTHRATE)))

Kind regards,
Duncan Garmonsway

	[[alternative HTML version deleted]]



More information about the R-devel mailing list