[Rd] Compression (really about LazyDate)

Prof Brian Ripley r|p|ey @end|ng |rom @t@t@@ox@@c@uk
Fri Feb 19 11:28:14 CET 2021


On 18/02/2021 18:30, Therneau, Terry M., Ph.D. via R-devel wrote:
> This is a CRAN question:
> 
> I have taken care to compress files in the data directory using "xz" (and checked that it
> is the best).  Is there then any impact or use for the LazyDataCompression option in the
> DESCRIPTION file?
> 

I have difficulty comprehending that, so I will try to answer my guess 
at what you meant to ask.

What LazyDataCompression does is completely separate from the contents 
of the data directory.  As the manual say

<quote>
Some packages using ‘LazyData’ will benefit from using a form of 
compression other than gzip in the installed lazy-loading database. This 
can be selected by the --data-compress option to R CMD INSTALL or by 
using the ‘LazyDataCompression’ field in the DESCRIPTION file. Useful 
values are bzip2, xz and the default, gzip. The only way to discover 
which is best is to try them all and look at the size of the 
pkgname/data/Rdata.rdb file.
</quote>

When a package is installed with LazyData (and you neglected to tell us 
if that is the case), the datasets in the data directory are loaded (and 
hence decompressed), and stored in a database.  For a LazyData package 
the compression used in the data directory only affects the source 
package size (I guess your criterion for 'best') and how fast it is 
installed (rarely a consideration but there have been LazyData packages 
where installing the data takes most of the time).  At run-time only the 
compression specified by LazyDataCompression is relevant.

-- 
Brian D. Ripley,                  ripley using stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford



More information about the R-devel mailing list