[Rd] R CMD build --resave-data

Hervé Pagès hpages at fhcrc.org
Wed Apr 13 19:54:41 CEST 2011


Hi Uwe,

On 11-04-13 10:50 AM, Uwe Ligges wrote:
>
>
> On 13.04.2011 02:53, Hervé Pagès wrote:
>> Hi Uwe,
>>
>> On 11-04-11 08:13 AM, Uwe Ligges wrote:
>>>
>>>
>>> On 11.04.2011 02:47, Hervé Pagès wrote:
>>>> Hi,
>>>>
>>>> More about the new --resave-data option
>>>>
>>>> As mentioned previously here
>>>>
>>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
>>>>
>>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
>>>> inconsistently. The former does --resave-data="gzip" by default.
>>>> The latter doesn't seem to support the --resave-data= syntax:
>>>> the --resave-data flag must either be present or not. And by
>>>> default 'R CMD INSTALL' won't resave the data.
>>>>
>>>> Also, because now 'R CMD build' is resaving the data, shouldn't it
>>>> reinstall the package in order to be able to do this correctly?
>>>>
>>>> Here is why. There is this new warning in 'R CMD check' that complains
>>>> about files not of a type allowed in a 'data' directory:
>>>>
>>>>
>>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The Icens package also has .R files under data/ with things like:
>>>>
>>>> bet <- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
>>>>
>>>> i.e. the R code needs to access some of the text files located
>>>> in the data/ folder. So in order to get rid of this warning I
>>>> tried to move those text files to inst/extdata/ and I modified
>>>> the code in the .R file so it does:
>>>>
>>>> CMVdata_filepath <- system.file("extdata", "CMVdata", package="Icens")
>>>> bet <- matrix(scan(CMVdata_filepath, quiet=TRUE),nc=5,byr=TRUE)
>>>>
>>>> But now 'R CMD build' fails to resave the data because the package
>>>> was not installed first and the CMVdata file could not be found.
>>>>
>>>> Unfortunately, for a lot of people that means that the safe way to
>>>> build a source tarball now is with
>>>>
>>>> R CMD build --keep-empty-dirs --no-resave-data
>>>
>>>
>>> Hervé,
>>>
>>> actually is makes some sense to have these defaults from a CRAN
>>> maintainer's point of view:
>>>
>>> --keep-empty-dirs:
>>> we found many packages containing empty dirs unnecessarily and the idea
>>> is to exclude them at the build state rather than at the later
>>> installation stage. Note that the package maintainer is supposed to run
>>> build (and knows if the empty dirs are to be included, the user who runs
>>> INSTALL does not).
>>>
>>> --no-resave-data:
>>> We found many packages with unsufficiently compressed data. This should
>>> be fixed when building the package, not later when installing it, since
>>> the reduces size is useful in the source tarball already.
>>>
>>> So it does make some sense to have different defaults in build as
>>> opposed to INSTALL from my point of view (although I could live with
>>> different, tough).
>>
>> If you deliberately ignore the fact that 'R CMD INSTALL' is also used
>> by developers to install from the *package source tree* (by opposition
>> to end users who use it to install from a *source tarball*, even though
>> they don't use it directly), then you have a point. So maybe I should
>> have been more explicit about the problem that it can be for the
>> *developer* to have 'R CMD build' and 'R CMD INSTALL' behave
>> differently by default.
>>
>> Of course I'm not suggesting that 'R CMD INSTALL' should behave
>> differently (by default) depending on whether it's used on a source
>> tarball (mode 1) or a package source tree (mode 2).
>>
>> I'm suggesting that, by default, the 3 commands (R CMD build +
>> R CMD INSTALL in mode 1 and 2) behave consistently.
>>
>> With the latest changes, and by default, 'R CMD INSTALL' is still doing
>> the right thing, but not 'R CMD build' anymore.
>>
>> I perfectly understand the intention behind those new flags, which is
>> to try to "optimize" the resulting source tarball but what would you
>> think if 'gcc' had some optimization flags that can generate broken
>> executables (under some circumstances) and if these flags were enabled
>> by default?
>>
>> Note that I would have no problem with 'R CMD build' trying to resave
>> the data by default if the current implementation of that feature
>> was working properly, but unfortunately it's broken (see my previous
>> email for the details).
>
> It is one thing to talk about sensible defaults and another thing to
> talk about bugs. I just talked about sensible defaults. And I have not
> had the time to look iunto details. I just arrived in Dortmund 15
> minutes ago and I the first thing I have to do is repairing some
> winbuilder stuff and get 2.13.0 ready on it. I may look into other
> details later this week or at the beginning of next week.

No problem. I understand perfectly. Release times are very busy time
on the Bioconductor side too. Thanks for looking into this!

H.

>
> Uwe
>
>
>
>> Thanks,
>> H.
>>
>>>
>>> If you need further arguments for the discussion: I also tend to use
>>> --no-vignettes nowadays if my code does not change considerably. ;-)
>>>
>>> Best wishes,
>>> Uwe
>>>
>>>
>>>
>>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
>>>> build' (and make it consistent with R CMD INSTALL) is not going to
>>>> grow too much ;-)
>>>>
>>>> Thanks,
>>>> H.
>>>>
>>>>
>>
>>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list