[Rd] CRAN package sizes

Yihui Xie xie at yihui.name
Sun Feb 13 22:02:32 CET 2011


Regarding the reasons that make the doc directory large, I wonder if
we can make some changes in R:

1. Use a null graphics device as the default device rather than pdf()
when running Sweave -- this can avoid the useless Rplots.pdf:

options(device = function(...) {
    .Call("R_GD_nullDevice", PACKAGE = "grDevices")
})

This can save some time in building the vignette(s) as well. (see
http://yihui.name/en/?p=673)

However, this undocumented null device may not work for certain
graphics. Here is an example that it fails for ggplot2:
http://stackoverflow.com/questions/4692974/ggplot2-code-that-works-interactively-rkward-crashes-under-lyx-pgfsweave-hint/4707745#4707745

Is it possible for someone to look into the null device (Dr Murrell?)
to make it stable enough?

2. Compress the PDF graphics and vignettes using third-party tools,
among which I recommend qpdf (it's free).

qpdf --stream-data=compress input.pdf output.pdf

This can reduce the size of PDF files a lot without quality loss. I'm
using this tool in the animation package to reduce the size of PDF
animations.

3. Sorry I bring up this issue again, but I don't understand why
Sweave could not implement the png() device along with pdf() and
postscript(). I'm willing to provide a patch if needed.

Thanks!

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Sun, Feb 13, 2011 at 6:30 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> Robin Hankin's post reminded me to post about the following recent addition
> to 'Writing R Extensions', in the section on 'Submitting a package to CRAN'
>
>  Ensure that the package sources are not unnecessarily large. ...
>  As a general rule, doc directories should not exceed 5Mb, and
>  where data directories need to be 10Mb or more, consideration should
>  be given to a separate package containing just the data. (Similarly
>  for external data directories, large jar files and other libraries
>  that need to be installed.)
>
> With 2800 packages on CRAN, overall size is becoming a concern and currently
> to install all of CRAN takes 4Gb.  As the attached (I hope) graph shows, the
> 20 packages over 20Mb take a quarter, and those over 5Mb take half.  (And
> this is after we have removed 100Mb from the largest installed package by
> re-compression, and archived the second largest, so Robin's package is
> currently the largest.)  Some of the largest packages are data/jar packages,
> but there are 55 packages with 'doc' directories over 5Mb.  To put that in
> perspective, PDFs of whole books with lots of figures (MASS, Paul's R
> Graphics) are well under 5Mb.
>
> R CMD check in R-devel reports on large packages, and expect in future that
> submitted package sizes will be questioned more often.
>
> There are lots of different reasons why doc directories are large, but the
> major ones are
>
> - installing files that are unneeded, such as Rplots.pdf and .eps
>  figures.
> - using PDF figures of images where PNG would be more appropriate.
> - including less than relevant material (such as how to install R,
>  with screenshots!)
>
> There are several ways to reduce the sizes of PDFs with no loss in quality,
> e.g. Adobe Acrobat Standard/Pro.
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list