[Rd] WISH: Built-in R session-specific universally unique identifier (UUID)

William Dunlap wdun|@p @end|ng |rom t|bco@com
Tue May 21 02:42:12 CEST 2019


I think a machine-specific input, like the MAC address, to the UUID is
essential.  S+ used to make a seed for the random number generator based on
the the current time and process ID.  A customer complained that all
machines in his cluster generated the same random number stream.  The
machines were rebooted each night, simultaneously, and S+ was started
during the boot process so times and process ids were identical, hence the
seeds were identical.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, May 20, 2019 at 4:48 PM Henrik Bengtsson <henrik.bengtsson using gmail.com>
wrote:

> # Proposal
>
> Provide a built-in mechanism for obtaining an identifier for the
> current R session, e.g.
>
> > Sys.info()[["session_uuid"]]
> [1] "4258db4d-d4fb-46b3-a214-8c762b99a443"
>
> The identifier should be "unique" in the sense that the probability
> for two R sessions(*) having the same identifier should be extremely
> small.  There's no need for reproducibility, i.e. the algorithm for
> producing the identifier may be changed at any time.
>
> (*) Two R sessions running at different times (seconds, minutes, days,
> years, ...) or on different machines (locally or anywhere in the
> world).
>
>
> # Use cases
>
> In parallel-processing workflows, R objects may be "exported"
> (serialized) to background R processes ("workers") for further
> processing.  In other workflows, objects may be saved to file to be
> reloaded in a future R session.  However, certain types of objects in
> R maybe only be relevant, or valid, in the R session that created
> them.  Attempts to use them in other R processes may give an obscure
> error or in the worst case produce garbage results.
>
> Having an identifier that is unique to each R process will make it
> possible to detect when an object is used in the wrong context.  This
> can be done by attaching the session identifier to the object.  For
> example,
>
> obj <- 42L
> attr(obj, "owner") <- Sys.info()[["session_uuid"]]
>
> With this, it is easy to validate the "ownership" later;
>
> stopifnot(identical(attr(obj, "owner"), Sys.info()[["session_uuid"]]))
>
> I argue that such an identifier should be part of base R for easy
> access and avoid each developer having to roll their own.
>
>
> # Possible implementation
>
> One proposal would be to bring in Simon Urbanek's 'uuid' package
> (https://cran.r-project.org/package=uuid) into base R.  This package
> provides:
>
> > uuid::UUIDgenerate()
> [1] "b7de6182-c9c1-47a8-b5cd-e5c8307a8efb"
>
> based on Theodore Ts'o's libuuid
> (https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/).  From
> 'man uuid_generate':
>
> "The uuid_generate function creates a new universally unique
> identifier (UUID). The uuid will be generated based on high-quality
> randomness from /dev/urandom, if available. If it is not available,
> then uuid_generate will use an alternative algorithm which uses the
> current time, the local ethernet MAC address (if available), and
> random data generated using a pseudo-random generator.
> [...]
> The UUID is 16 bytes (128 bits) long, which gives approximately
> 3.4x10^38 unique values (there are approximately 10^80 elementary
> particles in the universe according to Carl Sagan's Cosmos). The new
> UUID can reasonably be considered unique among all UUIDs created on
> the local system, and among UUIDs created on other systems in the past
> and in the future."
>
> An alternative, that does not require adding a dependency on the
> libuuid library, would be to roll a poor man's version based on a set
> of semi-unique attributes, e.g.
>
> make_id <- function(...) {
>   args <- list(...)
>   saveRDS(args, file = f <- tempfile())
>   on.exit(file.remove(f))
>   unname(tools::md5sum(f))
> }
>
> session_id <- local({
>   id <- NULL
>   function() {
>     if (is.null(id)) {
>       id <<- make_id(
>         info    = Sys.info(),
>         pid     = Sys.getpid(),
>         tempdir = tempdir(),
>         time    = Sys.time(),
>         random  = sample.int(.Machine$integer.max, size = 1L)
>       )
>     }
>     id
>   }
> })
>
> Example:
>
> > session_id()
> [1] "8d00b17384e69e7c9ecee47e0426b2a5"
>
> > session_id()
> [1] "8d00b17384e69e7c9ecee47e0426b2a5"
>
> /Henrik
>
> PS. Having a built-in make_id() function would be handy too, e.g. when
> creating object-specific identifiers for other purposes.
>
> PPS. It would be neat if there was an object, or connection, interface
> for tools::md5sum(), which currently only operates on files sitting on
> the file system. The digest package provides this functionality.
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list