[Rd] R CMD build --resave-data
Hervé Pagès
hpages at fhcrc.org
Wed Apr 13 07:21:58 CEST 2011
On 11-04-12 07:06 PM, Simon Urbanek wrote:
>
> On Apr 12, 2011, at 8:53 PM, Hervé Pagès wrote:
>
>> Hi Uwe,
>>
>> On 11-04-11 08:13 AM, Uwe Ligges wrote:
>>>
>>>
>>> On 11.04.2011 02:47, Hervé Pagès wrote:
>>>> Hi,
>>>>
>>>> More about the new --resave-data option
>>>>
>>>> As mentioned previously here
>>>>
>>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
>>>>
>>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
>>>> inconsistently. The former does --resave-data="gzip" by default.
>>>> The latter doesn't seem to support the --resave-data= syntax:
>>>> the --resave-data flag must either be present or not. And by
>>>> default 'R CMD INSTALL' won't resave the data.
>>>>
>>>> Also, because now 'R CMD build' is resaving the data, shouldn't it
>>>> reinstall the package in order to be able to do this correctly?
>>>>
>>>> Here is why. There is this new warning in 'R CMD check' that complains
>>>> about files not of a type allowed in a 'data' directory:
>>>>
>>>>
>>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
>>>>
>>>>
>>>>
>>>> The Icens package also has .R files under data/ with things like:
>>>>
>>>> bet<- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
>>>>
>>>> i.e. the R code needs to access some of the text files located
>>>> in the data/ folder. So in order to get rid of this warning I
>>>> tried to move those text files to inst/extdata/ and I modified
>>>> the code in the .R file so it does:
>>>>
>>>> CMVdata_filepath<- system.file("extdata", "CMVdata", package="Icens")
>>>> bet<- matrix(scan(CMVdata_filepath, quiet=TRUE),nc=5,byr=TRUE)
>>>>
>>>> But now 'R CMD build' fails to resave the data because the package
>>>> was not installed first and the CMVdata file could not be found.
>>>>
>>>> Unfortunately, for a lot of people that means that the safe way to
>>>> build a source tarball now is with
>>>>
>>>> R CMD build --keep-empty-dirs --no-resave-data
>>>
>>>
>>> Hervé,
>>>
>>> actually is makes some sense to have these defaults from a CRAN
>>> maintainer's point of view:
>>>
>>> --keep-empty-dirs:
>>> we found many packages containing empty dirs unnecessarily and the idea
>>> is to exclude them at the build state rather than at the later
>>> installation stage. Note that the package maintainer is supposed to run
>>> build (and knows if the empty dirs are to be included, the user who runs
>>> INSTALL does not).
>>>
>>> --no-resave-data:
>>> We found many packages with unsufficiently compressed data. This should
>>> be fixed when building the package, not later when installing it, since
>>> the reduces size is useful in the source tarball already.
>>>
>>> So it does make some sense to have different defaults in build as
>>> opposed to INSTALL from my point of view (although I could live with
>>> different, tough).
>>
>> If you deliberately ignore the fact that 'R CMD INSTALL' is also used
>> by developers to install from the *package source tree* (by opposition
>> to end users who use it to install from a *source tarball*,
>
> .. for a good reason, IMHO no serious developer would do that for obvious reasons -
This sounds like saying that no serious developer working on a big
project involving a lot of files to compile should use 'make'.
I mean, serious developers like you *always* do 'make clean' before
they do 'make' on the R tree when they need to test a change, even
a small one? And this only takes a "fraction of second" for them?
Hey, I'd love to be able to do that too! ;-)
H.
> you'd be working on a dirty copy creating many unnecessary problems and polluting your sources. The first time you'll spend an hour chasing a non-existent problem due to stale binary objects in your tree you'll learn that lesson ;). The fraction of a second spent in R CMD build is well worth the hours saved. IMHO the only valid reason to run INSTALL on a (freshly unpacked tar ball) directory is to capture config.log.
>
> Cheers,
> Simon
>
>
>
>> even though
>> they don't use it directly), then you have a point. So maybe I should
>> have been more explicit about the problem that it can be for the
>> *developer* to have 'R CMD build' and 'R CMD INSTALL' behave
>> differently by default.
>>
>> Of course I'm not suggesting that 'R CMD INSTALL' should behave
>> differently (by default) depending on whether it's used on a source
>> tarball (mode 1) or a package source tree (mode 2).
>>
>> I'm suggesting that, by default, the 3 commands (R CMD build +
>> R CMD INSTALL in mode 1 and 2) behave consistently.
>>
>> With the latest changes, and by default, 'R CMD INSTALL' is still doing
>> the right thing, but not 'R CMD build' anymore.
>>
>> I perfectly understand the intention behind those new flags, which is
>> to try to "optimize" the resulting source tarball but what would you
>> think if 'gcc' had some optimization flags that can generate broken
>> executables (under some circumstances) and if these flags were enabled
>> by default?
>>
>> Note that I would have no problem with 'R CMD build' trying to resave
>> the data by default if the current implementation of that feature
>> was working properly, but unfortunately it's broken (see my previous
>> email for the details).
>>
>> Thanks,
>> H.
>>
>>>
>>> If you need further arguments for the discussion: I also tend to use
>>> --no-vignettes nowadays if my code does not change considerably. ;-)
>>>
>>> Best wishes,
>>> Uwe
>>>
>>>
>>>
>>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
>>>> build' (and make it consistent with R CMD INSTALL) is not going to
>>>> grow too much ;-)
>>>>
>>>> Thanks,
>>>> H.
>>>>
>>>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list