[Rd] CRAN package sizes
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Feb 15 10:40:39 CET 2011
On Sun, 13 Feb 2011, Yihui Xie wrote:
> Regarding the reasons that make the doc directory large, I wonder if
> we can make some changes in R:
'we' cannot: only core developers can. However, end users can
contribute in many other ways: see below.
> 1. Use a null graphics device as the default device rather than pdf()
> when running Sweave -- this can avoid the useless Rplots.pdf:
>
> options(device = function(...) {
> .Call("R_GD_nullDevice", PACKAGE = "grDevices")
> })
>
> This can save some time in building the vignette(s) as well. (see
> http://yihui.name/en/?p=673)
>
> However, this undocumented null device may not work for certain
> graphics. Here is an example that it fails for ggplot2:
> http://stackoverflow.com/questions/4692974/ggplot2-code-that-works-interactively-rkward-crashes-under-lyx-pgfsweave-hint/4707745#4707745
>
> Is it possible for someone to look into the null device (Dr Murrell?)
> to make it stable enough?
I don't see a bug report on that, and a patch would help expedite
this.
> 2. Compress the PDF graphics and vignettes using third-party tools,
> among which I recommend qpdf (it's free).
>
> qpdf --stream-data=compress input.pdf output.pdf
>
> This can reduce the size of PDF files a lot without quality loss. I'm
> using this tool in the animation package to reduce the size of PDF
> animations.
*Can*, but I did say
'There are several ways to reduce the sizes of PDFs with no loss in
quality, e.g. Adobe Acrobat Standard/Pro.'
and qpdf is often ineffective (or worse), e.g. on package mokken. The
problem is that many of the large packages need images re-saved in
some other format (or preferably re-generated in some other format).
I've added a --compact-vignettes option to R CMD build (in R-devel).
At present it uses qpdf, but I will look out for better/additional
options. (I use Acrobat 9 Pro on my Mac and that has always beaten
qpdf, often by a large margin. But qpdf is perhaps the most readily
available of these tools.)
> 3. Sorry I bring up this issue again, but I don't understand why
> Sweave could not implement the png() device along with pdf() and
> postscript(). I'm willing to provide a patch if needed.
Does it need changes to R? I believe that it just needs a
different driver, something which could be provided in a package.
This has been raised several times (including recently) with the
Sweave maintainer, so maybe it will happpen eventually. But a package
would retrofit it to eariier versions of R.
>
> Thanks!
>
> Regards,
> Yihui
> --
> Yihui Xie <xieyihui at gmail.com>
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
>
>
>
> On Sun, Feb 13, 2011 at 6:30 AM, Prof Brian Ripley
> <ripley at stats.ox.ac.uk> wrote:
>> Robin Hankin's post reminded me to post about the following recent addition
>> to 'Writing R Extensions', in the section on 'Submitting a package to CRAN'
>>
>> Ensure that the package sources are not unnecessarily large. ...
>> As a general rule, doc directories should not exceed 5Mb, and
>> where data directories need to be 10Mb or more, consideration should
>> be given to a separate package containing just the data. (Similarly
>> for external data directories, large jar files and other libraries
>> that need to be installed.)
>>
>> With 2800 packages on CRAN, overall size is becoming a concern and currently
>> to install all of CRAN takes 4Gb. As the attached (I hope) graph shows, the
>> 20 packages over 20Mb take a quarter, and those over 5Mb take half. (And
>> this is after we have removed 100Mb from the largest installed package by
>> re-compression, and archived the second largest, so Robin's package is
>> currently the largest.) Some of the largest packages are data/jar packages,
>> but there are 55 packages with 'doc' directories over 5Mb. To put that in
>> perspective, PDFs of whole books with lots of figures (MASS, Paul's R
>> Graphics) are well under 5Mb.
>>
>> R CMD check in R-devel reports on large packages, and expect in future that
>> submitted package sizes will be questioned more often.
>>
>> There are lots of different reasons why doc directories are large, but the
>> major ones are
>>
>> - installing files that are unneeded, such as Rplots.pdf and .eps
>> figures.
>> - using PDF figures of images where PNG would be more appropriate.
>> - including less than relevant material (such as how to install R,
>> with screenshots!)
>>
>> There are several ways to reduce the sizes of PDFs with no loss in quality,
>> e.g. Adobe Acrobat Standard/Pro.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list