[Rd] GC: improving the marking performance for STRSXPs
Andreas Kersting
r-deve| @end|ng |rom @ker@t|ng@de
Thu Oct 7 09:33:42 CEST 2021
Hi all,
in GC (in src/main/memory.c), FORWARD_CHILDREN() (called by PROCESS_NODES()) treats STRSXPs just like VECSXPs, i.e. it calls FORWARD_NODE() for all its children. I claim that this is unnecessarily inefficient since the children of a STRSXP can legitimately only be (atomic) CHARSXPs and could hence be marked directly in the call of FORWARD_CHILDREN() on the STRSXP.
Attached patch (atomic_CHARSXP.diff) implements this and gives the following performance improvements on my system compared to R devel (revision 81008):
Elapsed time for two full gc in a session after
x <- as.character(runif(5e7))[]
19sec -> 15sec.
This is the best-case scenario for the patch: very many unique/unmarked CHARSXP in the STRSXP. For already marked CHARSXP there is no performance gain since FORWARD_NODE() is a no-op for them.
The relative performance gain is even bigger if iterating through the STRSXP produces many cache misses, as e.g. after
x <- as.character(runif(5e7))[]
x <- sample(x, length(x))
Elapsed time for two full gc here: 83sec -> 52sec. This is because we have less cache misses per CHARSXP.
This patch additionally also assumes that the ATTRIBs of a CHARSXP are not to be traced because they are just used for maintaining the CHARSXP hash chains.
The second attached patch (atomic_CHARSXP_safe_unlikely.diff) checks both assumptions and calls gc_error() if they are violated and is still noticeably faster than R devel: 19sec -> 17sec and 83sec -> 54sec, respectively.
Attached gc_test.R is the script I used to get the previously mentioned and more gc timings.
Do you think that this is a reasonable change? It does make the code more complex and I am not sure if there might be situations in which the assumptions are violated, even though SET_STRING_ELT() and installAttrib() do enforce them.
Best regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: atomic_CHARSXP.diff
Type: text/x-patch
Size: 1948 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20211007/f018a189/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: atomic_CHARSXP_safe_unlikely.diff
Type: text/x-patch
Size: 2427 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20211007/f018a189/attachment-0001.bin>
More information about the R-devel
mailing list