[Rd] R CMD build --resave-data

Simon Urbanek simon.urbanek at r-project.org
Wed Apr 13 04:06:10 CEST 2011


On Apr 12, 2011, at 8:53 PM, Hervé Pagès wrote:

> Hi Uwe,
> 
> On 11-04-11 08:13 AM, Uwe Ligges wrote:
>> 
>> 
>> On 11.04.2011 02:47, Hervé Pagès wrote:
>>> Hi,
>>> 
>>> More about the new --resave-data option
>>> 
>>> As mentioned previously here
>>> 
>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
>>> 
>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
>>> inconsistently. The former does --resave-data="gzip" by default.
>>> The latter doesn't seem to support the --resave-data= syntax:
>>> the --resave-data flag must either be present or not. And by
>>> default 'R CMD INSTALL' won't resave the data.
>>> 
>>> Also, because now 'R CMD build' is resaving the data, shouldn't it
>>> reinstall the package in order to be able to do this correctly?
>>> 
>>> Here is why. There is this new warning in 'R CMD check' that complains
>>> about files not of a type allowed in a 'data' directory:
>>> 
>>> 
>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
>>> 
>>> 
>>> 
>>> The Icens package also has .R files under data/ with things like:
>>> 
>>> bet <- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
>>> 
>>> i.e. the R code needs to access some of the text files located
>>> in the data/ folder. So in order to get rid of this warning I
>>> tried to move those text files to inst/extdata/ and I modified
>>> the code in the .R file so it does:
>>> 
>>> CMVdata_filepath <- system.file("extdata", "CMVdata", package="Icens")
>>> bet <- matrix(scan(CMVdata_filepath, quiet=TRUE),nc=5,byr=TRUE)
>>> 
>>> But now 'R CMD build' fails to resave the data because the package
>>> was not installed first and the CMVdata file could not be found.
>>> 
>>> Unfortunately, for a lot of people that means that the safe way to
>>> build a source tarball now is with
>>> 
>>> R CMD build --keep-empty-dirs --no-resave-data
>> 
>> 
>> Hervé,
>> 
>> actually is makes some sense to have these defaults from a CRAN
>> maintainer's point of view:
>> 
>> --keep-empty-dirs:
>> we found many packages containing empty dirs unnecessarily and the idea
>> is to exclude them at the build state rather than at the later
>> installation stage. Note that the package maintainer is supposed to run
>> build (and knows if the empty dirs are to be included, the user who runs
>> INSTALL does not).
>> 
>> --no-resave-data:
>> We found many packages with unsufficiently compressed data. This should
>> be fixed when building the package, not later when installing it, since
>> the reduces size is useful in the source tarball already.
>> 
>> So it does make some sense to have different defaults in build as
>> opposed to INSTALL from my point of view (although I could live with
>> different, tough).
> 
> If you deliberately ignore the fact that 'R CMD INSTALL' is also used
> by developers to install from the *package source tree* (by opposition
> to end users who use it to install from a *source tarball*,

.. for a good reason, IMHO no serious developer would do that for obvious reasons - you'd be working on a dirty copy creating many unnecessary problems and polluting your sources. The first time you'll spend an hour chasing a non-existent problem due to stale binary objects in your tree you'll learn that lesson ;). The fraction of a second spent in R CMD build is well worth the hours saved. IMHO the only valid reason to run INSTALL on a (freshly unpacked tar ball) directory is to capture config.log.

Cheers,
Simon



> even though
> they don't use it directly), then you have a point. So maybe I should
> have been more explicit about the problem that it can be for the
> *developer* to have 'R CMD build' and 'R CMD INSTALL' behave
> differently by default.
> 
> Of course I'm not suggesting that 'R CMD INSTALL' should behave
> differently (by default) depending on whether it's used on a source
> tarball (mode 1) or a package source tree (mode 2).
> 
> I'm suggesting that, by default, the 3 commands (R CMD build +
> R CMD INSTALL in mode 1 and 2) behave consistently.
> 
> With the latest changes, and by default, 'R CMD INSTALL' is still doing
> the right thing, but not 'R CMD build' anymore.
> 
> I perfectly understand the intention behind those new flags, which is
> to try to "optimize" the resulting source tarball but what would you
> think if 'gcc' had some optimization flags that can generate broken
> executables (under some circumstances) and if these flags were enabled
> by default?
> 
> Note that I would have no problem with 'R CMD build' trying to resave
> the data by default if the current implementation of that feature
> was working properly, but unfortunately it's broken (see my previous
> email for the details).
> 
> Thanks,
> H.
> 
>> 
>> If you need further arguments for the discussion: I also tend to use
>> --no-vignettes nowadays if my code does not change considerably. ;-)
>> 
>> Best wishes,
>> Uwe
>> 
>> 
>> 
>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
>>> build' (and make it consistent with R CMD INSTALL) is not going to
>>> grow too much ;-)
>>> 
>>> Thanks,
>>> H.
>>> 
>>> 
> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 



More information about the R-devel mailing list