[Rd] Package compression benchmarks for zstd vs gzip

Avraham Adler @vr@h@m@@d|er @end|ng |rom gm@||@com
Sun Jan 12 00:41:28 CET 2025


zstd is accessible within R using the archive package [1]. I use it
all the time when saving large objects, using code I adapted from [2].
Is your suggestion to import the libraries/source code into base?

[1] https://CRAN.R-project.org/package=archive
[2] https://coolbutuseless.github.io/2018/10/02/using-lz4-and-zstandard-to-compress-files-with-saverds/

On Fri, Jan 10, 2025 at 6:17 PM Jeroen Ooms <jeroenooms using gmail.com> wrote:
>
> Many distros and browsers these days use zstd as the preferred
> compression method. For example if you unpack a .deb or .rpm file on
> Debian or Fedora there is zstd archive inside. It is claimed that zstd
> offers improved compression over gzip, but (unlike lzma) it has
> comparable decompression speed. Maybe it is interesting to get an
> estimate of how much R packages would benefit from zstd.
>
> Testing this for source packages and MacOS binary packages it is easy
> as we can gunzip and recompress tar.gz files without having to extract
> the tarball itself:
>
>   OUTPUT="sizes.txt"
>   echo "FILE GZIP ZSTD" > $OUTPUT
>   for x in *gz; do
>     FILE=$(basename $x)
>     GZIP=$(wc -c "$x" | awk '{print $1}')
>     ZSTD=$(gunzip -c $x | zstd -19 | wc -c)
>     echo "$FILE $GZIP $ZSTD" | tee -a $OUTPUT
>   done
>
> Attached are results of running this script on the 500 most downloaded
> CRAN packages. It shows about 16% size reduction for sources, and 19%
> for binaries.
>
> Zstd is BSD licensed C code that can easily be embedded in any project.
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list