[Rd] Request: tools::md5sum should accept connections and finally in-memory objects
Dénes Tóth
toth@dene@ @end|ng |rom kogentum@hu
Sat May 2 00:07:43 CEST 2020
On 5/1/20 11:35 PM, Duncan Murdoch wrote:
> The tools package is not for users, it's for functions that R uses in
> installing packages, checking them, etc.
I think the target group for this functionality is the group of R
developers, not regular R users.
> If you want a function for
> users, it would belong in utils. But what's wrong with the digest
> package? What's the argument that R Core should take this on?
There is nothing wrong with the digest package except for being an extra
dependency which could be avoided if an already implemented C function
were available at the R level.
I do understand that given the load on R Core, they do include new
features and the related burden of maintenance only if it is absolutely
necessary. This is why I asked first whether there is a particular
reason not to expose an already existing (base-R) implementation. I
think it is reasonable to assume that 'md5_buffer' exists for a reason -
but probably there is a reason why it never became part of any exported
function. Now I checked the history of the md5.c file; it was last
edited 8 years ago. Somewhat surprisingly, md5_buffer was already
included in the original file (created 17 years ago), but marked as
UNUSED 12 years ago.
Just to clarify: I do not want suggest that R Core team should take over
all functionalities of the digest package. I do really focus on
computing MD5 digests, which is already possible for files. My
suggestion for a more general function was meant for keeping potential
further enhancements in mind.
>
> Duncan Murdoch
>
> On 01/05/2020 5:00 p.m., Dénes Tóth wrote:
>>
>> AFAIK there is no hashing utility in base R which can create hash
>> digests of arbitrary R objects. However, as also described by Henrik
>> Bengtsson in [1], we have tools::md5sum() which calculates MD5 hashes of
>> files. Calculating hashes of in-memory objects is a very common task in
>> several areas, as demonstrated by the popularity of the 'digest' package
>> (~850.000 downloads/month).
>>
>> Upon the inspection of the relevant files in the R-source (e.g., [2] and
>> [3]), it seems all building blocks have already been implemented so that
>> hashing should not be restricted to files. I would like to ask:
>>
>> 1) Why is md5_buffer unused?:
>> In src/library/tools/src/md5.c [see 2], md5_buffer is implemented which
>> seems to be the counterpart of md5_stream for non-file inputs:
>>
>> ---
>> #ifdef UNUSED
>> /* Compute MD5 message digest for LEN bytes beginning at BUFFER. The
>> result is always in little endian byte order, so that a byte-wise
>> output yields to the wanted ASCII representation of the message
>> digest. */
>> static void *
>> md5_buffer (const char *buffer, size_t len, void *resblock)
>> {
>> struct md5_ctx ctx;
>>
>> /* Initialize the computation context. */
>> md5_init_ctx (&ctx);
>>
>> /* Process whole buffer but last len % 64 bytes. */
>> md5_process_bytes (buffer, len, &ctx);
>>
>> /* Put result in desired memory area. */
>> return md5_finish_ctx (&ctx, resblock);
>> }
>> #endif
>> ---
>>
>> 2) How can the R-community help so that this feature becomes available
>> in package 'tools'?
>>
>> Suggestions:
>> As a first step, it would be great if tools::md5sum would support
>> connections (credit goes to Henrik for the idea). E.g., instead of the
>> signature tools::md5sum(files), we could have tools::md5sum(files, conn
>> = NULL), which would allow:
>>
>> x <- runif(10)
>> tools::md5sum(conn = rawConnection(serialize(x, NULL)))
>>
>> To avoid the inconsistency between 'files' (which computes the hash
>> digests in a vectorized manner, that is, one for each file) and 'conn'
>> (which expects a single connection), and to make it easier to extend the
>> hashing for other algorithms without changing the main R interface, a
>> more involved solution would be to introduce tools::hash and
>> tools::hashes, in a similar vein to digest::digest and
>> digest::getVDigest.
>>
>> Regards,
>> Denes
>>
>>
>> [1]: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21
>> [2]:
>> https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/md5.c#L172
>>
>> [3]:
>> https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/Rmd5.c#L27
>>
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
More information about the R-devel
mailing list