[Rd] R CMD build --resave-data
Martin Maechler
maechler at stat.math.ethz.ch
Wed Apr 13 14:45:37 CEST 2011
>>>>> Hervé Pagès <hpages at fhcrc.org>
>>>>> on Tue, 12 Apr 2011 22:21:58 -0700 writes:
> On 11-04-12 07:06 PM, Simon Urbanek wrote:
>>
>> On Apr 12, 2011, at 8:53 PM, Hervé Pagès wrote:
>>
>>> Hi Uwe,
>>>
>>> On 11-04-11 08:13 AM, Uwe Ligges wrote:
>>>>
>>>>
>>>> On 11.04.2011 02:47, Hervé Pagès wrote:
>>>>> Hi,
>>>>>
>>>>> More about the new --resave-data option
>>>>>
>>>>> As mentioned previously here
>>>>>
>>>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
>>>>>
>>>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
>>>>> inconsistently. The former does --resave-data="gzip" by
>>>>> default. The latter doesn't seem to support the
>>>>> --resave-data= syntax: the --resave-data flag must either be
>>>>> present or not. And by default 'R CMD INSTALL' won't resave
>>>>> the data.
>>>>>
>>>>> Also, because now 'R CMD build' is resaving the data,
>>>>> shouldn't it reinstall the package in order to be able to do
>>>>> this correctly?
>>>>>
>>>>> Here is why. There is this new warning in 'R CMD check' that
>>>>> complains about files not of a type allowed in a 'data'
>>>>> directory:
>>>>>
>>>>>
>>>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
>>>>>
>>>>>
>>>>>
>>>>> The Icens package also has .R files under data/ with things
>>>>> like:
>>>>>
>>>>> bet<- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
>>>>>
>>>>> i.e. the R code needs to access some of the text files
>>>>> located in the data/ folder. So in order to get rid of this
>>>>> warning I tried to move those text files to inst/extdata/
>>>>> and I modified the code in the .R file so it does:
>>>>>
>>>>> CMVdata_filepath<- system.file("extdata", "CMVdata",
>>>>> package="Icens") bet<- matrix(scan(CMVdata_filepath,
>>>>> quiet=TRUE),nc=5,byr=TRUE)
>>>>>
>>>>> But now 'R CMD build' fails to resave the data because the
>>>>> package was not installed first and the CMVdata file could
>>>>> not be found.
>>>>>
>>>>> Unfortunately, for a lot of people that means that the safe
>>>>> way to build a source tarball now is with
>>>>>
>>>>> R CMD build --keep-empty-dirs --no-resave-data
>>>>
>>>>
>>>> Hervé,
>>>>
>>>> actually is makes some sense to have these defaults from a
>>>> CRAN maintainer's point of view:
>>>>
>>>> --keep-empty-dirs: we found many packages containing empty
>>>> dirs unnecessarily and the idea is to exclude them at the
>>>> build state rather than at the later installation stage. Note
>>>> that the package maintainer is supposed to run build (and
>>>> knows if the empty dirs are to be included, the user who runs
>>>> INSTALL does not).
>>>>
>>>> --no-resave-data: We found many packages with unsufficiently
>>>> compressed data. This should be fixed when building the
>>>> package, not later when installing it, since the reduces size
>>>> is useful in the source tarball already.
>>>>
>>>> So it does make some sense to have different defaults in
>>>> build as opposed to INSTALL from my point of view (although I
>>>> could live with different, tough).
>>>
>>> If you deliberately ignore the fact that 'R CMD INSTALL' is
>>> also used by developers to install from the *package source
>>> tree* (by opposition to end users who use it to install from a
>>> *source tarball*,
>>
>> .. for a good reason, IMHO no serious developer would do that
>> for obvious reasons -
> This sounds like saying that no serious developer working on a
> big project involving a lot of files to compile should use
> 'make'. I mean, serious developers like you *always* do 'make
> clean' before they do 'make' on the R tree when they need to
> test a change, even a small one? And this only takes a "fraction
> of second" for them? Hey, I'd love to be able to do that too!
> ;-)
> H.
>> you'd be working on a dirty copy creating many unnecessary
>> problems and polluting your sources. The first time you'll
>> spend an hour chasing a non-existent problem due to stale
>> binary objects in your tree you'll learn that lesson ;). The
>> fraction of a second spent in R CMD build is well worth the
>> hours saved. IMHO the only valid reason to run INSTALL on a
>> (freshly unpacked tar ball) directory is to capture config.log.
>>
>> Cheers, Simon
>>
>>
>>
>>> even though they don't use it directly), then you have a
>>> point. So maybe I should have been more explicit about the
>>> problem that it can be for the *developer* to have 'R CMD
>>> build' and 'R CMD INSTALL' behave differently by default.
>>>
>>> Of course I'm not suggesting that 'R CMD INSTALL' should
>>> behave differently (by default) depending on whether it's used
>>> on a source tarball (mode 1) or a package source tree (mode
>>> 2).
>>>
>>> I'm suggesting that, by default, the 3 commands (R CMD build +
>>> R CMD INSTALL in mode 1 and 2) behave consistently.
>>>
>>> With the latest changes, and by default, 'R CMD INSTALL' is
>>> still doing the right thing, but not 'R CMD build' anymore.
>>>
>>> I perfectly understand the intention behind those new flags,
>>> which is to try to "optimize" the resulting source tarball but
>>> what would you think if 'gcc' had some optimization flags that
>>> can generate broken executables (under some circumstances) and
>>> if these flags were enabled by default?
>>>
>>> Note that I would have no problem with 'R CMD build' trying to
>>> resave the data by default if the current implementation of
>>> that feature was working properly, but unfortunately it's
>>> broken (see my previous email for the details).
>>>
>>> Thanks, H.
>>>
>>>>
>>>> If you need further arguments for the discussion: I also tend to use
>>>> --no-vignettes nowadays if my code does not change considerably. ;-)
>>>>
>>>> Best wishes,
>>>> Uwe
>>>>
>>>>
>>>>
>>>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
>>>>> build' (and make it consistent with R CMD INSTALL) is not going to
>>>>> grow too much ;-)
;-)
I'm with Herve here.
I almost always use R CMD INSTALL on a directory rather than a
tarball... though most of the time the directory is freshly
untarred.
Other times, however one of the reasons is exactly that I can
keep things around (*.o, ...) which are only rebuilt very
rarely.
Martin
More information about the R-devel
mailing list