[Rd] Choices to remove `srcref` (and its buddies) when serializing objects

Dipterix Wang d|pter|x@w@ng @end|ng |rom gm@||@com
Fri Jan 12 06:11:45 CET 2024


Dear R devs,

I was digging into a package issue today when I realized R serialize function not always generate the same results on equivalent objects when users choose to run differently. For example, the following code

serialize(with(new.env(), { function(){} }), NULL, TRUE)

generates different results when I copy-paste into console vs when I use ctrl+shift+enter to source the file in RStudio. 

With a deeper inspect into the cause, I found that function and language get source reference when getOption("keep.source") is TRUE. This means the source reference will make the functions different while in most cases, whether keeping function source might not impact how a function behaves.

While it's OK that function serialize generates different results, functions such as `rlang::hash` and `digest::digest`, which depend on `serialize` might eventually deliver false positives on same inputs. I've checked source code in digest package hoping to get around this issue (for example serialize(..., refhook = ...)). However, my workaround did not work. It seems that the markers to the objects are different even if I used `refhook` to force srcref to be the same. I also tried `removeSource` and `rlang::zap_srcref`. None of them works directly on nested environments with multiple functions. 

I wonder how hard it would be to have options to discard source when serializing R objects? 

Currently my analyses heavily depend on digest function to generate file caches and automatically schedule pipelines (to update cache) when changes are detected. The pipelines save the hashes of source code, inputs, and outputs together so other people can easily verify the calculation without accessing the original data (which could be sensitive), or running hour-long analyses, or having to buy servers. All of these require `serialize` to produce the same results regardless of how users choose to run the code.

It would be great if this feature could be in the future R. Other pipeline packages such as `targets` and `drake` can also benefit from it.

Thanks,

- Dipterix
	[[alternative HTML version deleted]]



More information about the R-devel mailing list