[Rd] GC: speeding-up the CHARSXP cache maintenance, 2nd try

Wed Nov 3 12:51:43 CET 2021

Hi,

In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I proposed to speed up the CHARSXP cache maintenance during GC using threading. This was rejected by Luke in https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html.

Here I want to propose an alternative approach to significantly speed up CHARSXP cache maintenance during partial GCs. A patch which passes `make check-devel` is attached. Compared to R devel (revision 81110) I get the following performance improvements on my system:

Elapsed time for five non-full gc in a session after

x <- as.character(runif(5e7))[]
gc(full = TRUE)

+20sec -> ~1sec.

This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs. However, I did not measure dramatic differences:

y <- "old_CHARSXP" 

after

x <- "old_CHARSXP"; gc(); gc()

takes a median 32 nanoseconds with and without the patch.

gc(full = TRUE)

in a new session takes a median 16 milliseconds with and 14 without the patch.

The basic idea is to maintain the CHARSXP cache using subtables in R_StringHash, one for each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS + 1) GC generations. New CHARSXPs are added by mkCharLenCE() to the subtable of the youngest generation. After a partial GC, only the chains anchored at the subtables of the youngest (num_old_gens_to_collect + 1) generations need to be searched for and cleaned of unmarked nodes. Afterwards, these chains need to be merged into those of the respective next generation, if any. This approach relies on the fact that an object/CHARSXP can never become younger again. It is OK though if an object/CHARSXP "skips" a GC generation.

R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), is structured such that the chains for the same hashcode but for different generations are anchored at slots of R_StringHash which are next to each other in memory. This is because we often need to access two or more (i.e. currently all three) of them for one operation and this avoids cache misses.

HASHPRI, i.e. the number of occupied primary slots, is computed and stored as NUM_GC_GENERATIONS times the number of slots which are occupied in at least one of the subtables. This is done because in mkCharLenCE() we need to iterate through one or more chains if and only if there is a chain for the particular hashcode in at least one subtable.

I tried to keep the patch as minimal as possible. In particular, I did not add long vector support to R_StringHash. I rather reduced the max value of char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not larger than) 2. I also did not yet adjust do_show_cache() and do_write_cache(), but I could do so if the patch is accepted.

Thanks for your consideration and feedback.

Regards,
Andreas

P.S. I had a hard time to get the indentation right in the patch due the mix of tabs and spaces. Sorry, if I screwed this up.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: r_stringhash.diff
Type: text/x-patch
Size: 8495 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20211103/078ff701/attachment.bin>