[Rd] Hashed environments of size <5 never grow
Duncan Garmonsway
Dunc@n@G@rmon@w@y @end|ng |rom gm@||@com
Mon Apr 11 23:24:52 CEST 2022
Hello,
Hashed environments that begin with a (non-default) size of 4 or less, will
never grow, which is very detrimental to performance. For example,
```
n <- 10000
l <- vector("list", n)
l <- setNames(l, seq_len(n))
# Takes a second, and nchains remains 1.
e1 <- list2env(l, hash = TRUE, size = 1)
env.profile(e1)$nchains
# [1] 1
# Returns instantly, and nchains grows to 6950
e2 <- list2env(l, hash = TRUE, size = 5)
env.profile(e2)$nchains
# [1] 6950
```
The cause is that, when calling the growth function, the new size is
truncated to an integer. See src/main/envir.c line 440, or
https://github.com/wch/r-source/blob/d9b9d00b6d2764839f229bf011dda8d027aae227/src/main/envir.c#L440
Given the hard-coded growth rate of 1.2, any size of 4 or less will be
truncated back to itself.
(int) (1 * 1.2 ) = 1
(int) (2 * 1.2) = 1
(int) (3 * 1.2) = 1
(int) (4 * 1.2) = 1
(int) (5 * 1.2) = 6
This is a rare case, and I couldn't find any examples in CRAN packages of
the `size` argument being used at all, let alone so small. Even so, it
tripped me up, and could be fixed by using `ceil()` in src/main/envir.c
line 440 as follows.
new_table = R_NewHashTable((int)(ceil(HASHSIZE(table) *
HASHTABLEGROWTHRATE)))
Kind regards,
Duncan Garmonsway
[[alternative HTML version deleted]]
More information about the R-devel
mailing list