[Rd] Package compression benchmarks for zstd vs gzip

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Mon Jan 13 02:26:14 CET 2025


I think the first step would have to be to add zstd support to R. zstd is a bit controversial (as shown by the community blowback of the changes you mentioned) and their build system (calling it that is being very generous) is mess so it would require a bit of testing, but it is doable.

That said, assuming the above is solved, we have been debating the change of compression at CRAN in general for a bit, but the assumptions about the file names are built into today’s tools so there would be certainly some fall-out - not just in R, but also the ecosystems around it. As you pointed out, possibly the safest place to start are binaries, since we have tighter control of those and they are used in fewer places.

Personally, I think the higher priority is signing, so as we address that we may just include the compression change with it since it will require some tool changes anyway. I was thinking of using xz as that is more stable, already supported and less controversial, but I don’t think the choice really matters - it just has to be a compression which R supports (zstd and xz have different benefits, so it’s always a trade-off without a clear winner).

Cheers,
Simon


> On 11 Jan 2025, at 12:16, Jeroen Ooms <jeroenooms using gmail.com> wrote:
> 
> Many distros and browsers these days use zstd as the preferred
> compression method. For example if you unpack a .deb or .rpm file on
> Debian or Fedora there is zstd archive inside. It is claimed that zstd
> offers improved compression over gzip, but (unlike lzma) it has
> comparable decompression speed. Maybe it is interesting to get an
> estimate of how much R packages would benefit from zstd.
> 
> Testing this for source packages and MacOS binary packages it is easy
> as we can gunzip and recompress tar.gz files without having to extract
> the tarball itself:
> 
>  OUTPUT="sizes.txt"
>  echo "FILE GZIP ZSTD" > $OUTPUT
>  for x in *gz; do
>    FILE=$(basename $x)
>    GZIP=$(wc -c "$x" | awk '{print $1}')
>    ZSTD=$(gunzip -c $x | zstd -19 | wc -c)
>    echo "$FILE $GZIP $ZSTD" | tee -a $OUTPUT
>  done
> 
> Attached are results of running this script on the 500 most downloaded
> CRAN packages. It shows about 16% size reduction for sources, and 19%
> for binaries.
> 
> Zstd is BSD licensed C code that can easily be embedded in any project.
> <sources.txt><binaries.txt>______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list