[Rd] Compressing data for package builds

Uwe Ligges ligges at statistik.tu-dortmund.de
Fri Aug 17 12:01:33 CEST 2012



On 17.08.2012 07:24, steven mosher wrote:
> " R CMD build is how you preferably should be creating your package tar
> ball, so you simply add the --resave-data argument to your already existing
> R CMD build call which creates the tar ball from your source directory. So
> can you elaborate on "doesn't do anything I can see"? In what sense? No
> output? No compression? "
>
> my tarball builds with   > R CDM build mattools
>
> where mattools is the name of the package. and I get a warning on R CMD
> check.
>
> Things I tried
>
> R CMD build --resave-data
> R CMD build mattools --resave-data
> R CMD build --resave-data mattools
>
> The first does nothing, the second fails on unknown options and the third
> fails on unknown options. So I  found the help for R CMD
>
> Now that I figured out how to display help for  R CMD build  I see that
>
> --resave-data   must include a specification of the type of compression
>
> --resave-data="best"   for example
>
> I ran that.  and got the same error indicating that the   rda file had not
> been compressed.
>
>   checking data for non-ASCII characters ... OK
> * checking data for ASCII and uncompressed saves ... WARNING
>    Warning: large data file(s) saved inefficiently:
>                  size ASCII compress
>    zagoskin.rda 137Kb FALSE     none
>
>    Note: significantly better compression could be obtained
>          by using R CMD build --resave-data
>                 old_size new_size compress
>    modpoll.rda     124Kb     78Kb       xz
>    zagoskin.rda    137Kb      6Kb    bzip2
>
> Building under windows so I wonder if I am missing a system file required
> to do the compression.



Are you checking the tarball (as recommended) or the source dir? The 
compressed versions are in the tarball. The source dir is not changed.

Uwe Liges






>
> On Thu, Aug 16, 2012 at 5:48 PM, Simon Urbanek
> <simon.urbanek at r-project.org>wrote:
>
>>
>> On Aug 16, 2012, at 5:08 PM, steven mosher wrote:
>>
>>> Hi,
>>>
>>> I have two  .rda files that I need to include in a package.  I've placed
>>> them both in a data directory
>>> after  save()  the are around  150Kb  each.
>>>
>>> When I try to check the package I get the following warning
>>>
>>> Warning: large data file(s) saved inefficiently:
>>>                 size ASCII compress
>>>   zagoskin.rda 137Kb FALSE     none
>>>
>>>   Note: significantly better compression could be obtained
>>>         by using R CMD build --resave-data
>>>                old_size new_size compress
>>>   modpoll.rda     124Kb     78Kb       xz
>>>   zagoskin.rda    137Kb      6Kb    bzip2
>>>
>>> Both of these files modpoll.rda and zagoskin.rda  have already been
>>> compressed from megabytes down to Kb.
>>>
>>> Also,, the  instructions    "R CMD build --resave-data"  doesnt do
>> anything
>>> that I can see so I must be using it wrong.
>>
>> R CMD build is how you preferably should be creating your package tar
>> ball, so you simply add the --resave-data argument to your already existing
>> R CMD build call which creates the tar ball from your source directory. So
>> can you elaborate on "doesn't do anything I can see"? In what sense? No
>> output? No compression?
>>
>> Cheers,
>> Simon
>>
>>
>>> Is there a piece of the puzzle I am missing or instructions better than
>>> these: I tried  LazyDataCompression and my
>>> data.rdb is 90Kb.
>>>
>>> "Package *tools* has a couple of functions to help with data images:
>>> checkRdaFiles reports on the way the image was saved, and resaveRdaFiles
>> will
>>> re-save with a different type of compression, including choosing the best
>>> type for that particular image.
>>>
>>> Some packages using ŒLazyData‚ will benefit from using a form of
>>> compression other than gzip in the installed lazy-loading database. This
>>> can be selected by the --data-compress option to R CMD INSTALL or by
>> using
>>> the ŒLazyDataCompression‚ field in the DESCRIPTION file. Useful values
>> are
>>> bzip2, xz and the default, gzip. The only way to discover which is best
>> is
>>> to try them all and look at the size of the pkgname/data/Rdata.rdb file."
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list