[R] Reducing the size of pdf graphics files produced with R

Thu May 24 16:59:23 CEST 2007

Hi again,

Many of you have suggested other means than pdf device and/or  
conversion/compression of pdf outside of R.

I ran some tests on a small, a medium-size and a large figure. Here I  
summarize the results, which depend very much on the original  
graphics file. Please note that I wish to retain a vector-based  
graphic file.

You'll find at the end of this message the R program used to produce  
the graphics files.

Starting with a small size graphic file: in order, these were  
produced by

1) postscript device
2) pdf device
3) bitmap device (pdf output)
4) dev2bitmap, pdf output, from a quartz window
5) quartz device saved to pdf via command quarts.save
6) quartz device saved to pdf via save menu in R gui

-rw-r--r--   1 chabotd  chabotd    243446 May 23 21:00 test_ps_from_R.ps
-rw-r--r--   1 chabotd  chabotd    572513 May 23 21:00  
test_pdf_from_R.pdf
-rw-r--r--   1 chabotd  chabotd    600106 May 24 09:21  
test_pdf_bitmapR.pdf
-rw-r--r--   1 chabotd  chabotd    600050 May 24 09:22  
test_dev2bitmap.pdf
-rw-r--r--   1 chabotd  chabotd   1657446 May 23 21:00  
test_pdf_from_quartz.save.pdf
-rw-r--r--   1 chabotd  chabotd    572634 May 23 21:01  
test_pdf_from_quartz.menu.pdf

These show how "test_pdf_from_R.pdf" can be shrunk outside of R
1) the command pdftk
2) opening the pdf in any Mac OS X pdf viewer and doing "print to  
compressed pdf"

-rw-r--r--   1 chabotd  chabotd     68742 May 24 09:25  
test_pdf_pdftk.pdf
-rw-r--r--   1 chabotd  chabotd    100660 May 23 21:16  
test_pdf_print_to_comppdf.pdf

Finally, these show 3 conversions from postscript to pdf outside of R
1) command ps2pdf
2) command epstopdf
3) command pstopdf

-rw-r--r--   1 chabotd  chabotd    566626 May 23 21:12  
test_ps_ps2pdf.pdf
-rw-r--r--   1 chabotd  chabotd    566587 May 24 10:21  
test_ps_epstopdf.pdf
-rw-r--r--   1 chabotd  chabotd   1939788 May 24 10:20  
test_ps_pstopdf.pdf

For this first example, all pdf produced directly within R were of  
similar size, except one (quartz.save) that was 3x larger. Producing  
a postscript file and transforming it into pdf resulted in no  
significant saving. However pdf output from R can be shrunk (here to  
12% of original size) with pdftk. So far I found no adverse effect of  
this shrinking.

I did the same with a larger graphic, this example came from Dave  
Watson. Using the same blocks as above:

Produced with R:
-rw-r--r--   1 chabotd  chabotd    854320 May 24 09:08  
mauna_ps_from_R.eps
-rw-r--r--   1 chabotd  chabotd   1000504 May 24 09:08  
mauna_pdf_from_R.pdf
-rw-r--r--   1 chabotd  chabotd     96737 May 24 09:08  
mauna_pdf_bitmapR.pdf
-rw-r--r--   1 chabotd  chabotd     97236 May 24 09:17  
mauna_dev2bitmap.pdf
-rw-r--r--   1 chabotd  chabotd    468195 May 24 09:08  
mauna_pdf_from_quartz.save.pdf
-rw-r--r--   1 chabotd  chabotd    999853 May 24 09:09  
mauna_pdf_from_quartz.menu.pdf

PS to pdf outside of R
-rw-r--r--   1 chabotd  chabotd     95024 May 24 09:11  
mauna_ps_ps2pdf.pdf
-rw-r--r--   1 chabotd  chabotd    603021 May 24 10:40  
mauna_ps_pstopdf.pdf
-rw-r--r--   1 chabotd  chabotd     95015 May 24 10:40  
mauna_ps_epstopdf.pdf

pdf transformation outside of R
-rw-r--r--   1 chabotd  chabotd    104487 May 24 09:12  
mauna_pdf_pdftk.pdf
-rw-r--r--   1 chabotd  chabotd    134663 May 24 09:23  
mauna_print_to_comppdf.pdf

For this example, different methods of producing pdf within R had  
very different file sizes. The two methods based on quartz performed  
in reverse order compare to the previous example. Overall, using  
bitmap device or postscript-transformed-to-pdf outside of R produced  
files about 10% the size of the file produced by pdf device. But the  
latter could be shrunk almost as much using pdftk.

Finally, a larger-size example:
Produced with R:
-rw-r--r--   1 chabotd  chabotd   1426330 May 23 20:54 fig_ps_from_R.ps
-rw-r--r--   1 chabotd  chabotd   3384788 May 23 20:54  
fig_pdf_from_R.pdf
-rw-r--r--   1 chabotd  chabotd   3494689 May 24 09:03  
fig_pdf_bitmapR.pdf
-rw-r--r--   1 chabotd  chabotd   3494689 May 24 10:46  
fig_dev2bitmap.pdf
-rw-r--r--   1 chabotd  chabotd   3384832 May 23 20:54  
fig_pdf_from_quartz.menu.pdf
-rw-r--r--   1 chabotd  chabotd   9583552 May 23 20:52  
fig_pdf_from_quartz.save.pdf

PS to pdf outside of R
-rw-r--r--   1 chabotd  chabotd   3356223 May 23 21:12 fig_ps_ps2pdf.pdf
-rw-r--r--   1 chabotd  chabotd  11397461 May 23 23:51  
fig_ps_pstopdf.pdf
-rw-r--r--   1 chabotd  chabotd   3354762 May 23 23:55  
fig_ps_epstopdf.pdf

pdf transformation outside of R
-rw-r--r--   1 chabotd  chabotd    379307 May 23 22:31  
fig_pdf_comptk.pdf
-rw-r--r--   1 chabotd  chabotd    520509 May 24 00:19  
fig_pdf_print_to_comppdf.pdf

This time, as in the first example, there was little benefit going  
the bitmap device or ps to pdf route. Only shrinking the pdf with  
pdftk was effective. So examples with a lot of objects on the plot do  
not seem to benefit from postscript use, but one example with few  
objects (but objects that were "filled, don't know if it matters) did.

I have never done this in R, but could the pdftk command be run from  
within a R script? This would allow one to compress automatically  
when needed.

Thank you all for the suggestions,

Denis

##############  R program that produced the above files  
#################
# example 1, small
pdf(file="test_pdf_from_R.pdf", w=5, h=5, version="1.4",  
bg="transparent")
plot(rnorm(10000), rnorm(10000))
dev.off()

postscript(file="test_ps_from_R.ps", width=5, height=5, paper="special")
plot(rnorm(10000), rnorm(10000))
dev.off()

bitmap(file = "test_pdf_bitmapR.pdf", width=5, height=5, type =  
"pdfwrite")
plot(rnorm(10000), rnorm(10000))
dev.off()

plot(rnorm(10000), rnorm(10000))
quartz.save(file="test_pdf_from_quartz.save.pdf", type="pdf")
dev2bitmap(file="test_dev2bitmap.pdf", width=5, height=5,  
type="pdfwrite")
# here I also manually saved the quartz graphics and called it  
"test_pdf_from_quartz.menu.pdf"

# Example from Dave Watson

postscript(file = "mauna_ps_from_R.eps", width=5, height=5,  
horizontal=FALSE, paper="special", onefile=FALSE)
filled.contour(volcano, color=terrain.colors, asp=1)
title(main="volcano data: filled contour map")
dev.off()

pdf(file = "mauna_pdf_from_R.pdf", width=5, height=5)
filled.contour(volcano, color=terrain.colors, asp=1)
title(main="volcano data: filled contour map")
dev.off()

bitmap(file = "mauna_pdf_bitmapR.pdf", width=5, height=5, type =  
"pdfwrite")
filled.contour(volcano, color=terrain.colors, asp=1)
title(main="volcano data: filled contour map")
dev.off()

# on mac os x only
quartz(w=5, h=5)
filled.contour(volcano, color=terrain.colors, asp=1)
title(main="volcano data: filled contour map")
dev2bitmap(file="mauna_dev2bitmap.pdf", width=5, height=5,  
type="pdfwrite")
quartz.save(file="mauna_pdf_from_quartz.save.pdf", type="pdf")
# here I also manually saved the quartz graphics and called it  
"mauna_pdf_from_quartz.menu.pdf"

# example 3, large

x <- rep(1:99, 20)

c <- 0
for (a in 1:3) {
   for (b in c(0.7, 0.9) ) {
   c<-c+1
   nam <- paste("Y", c, sep="")
   assign(nam, a + b*x + rnorm(length(x),20,10))
   }
   }
the.data <- data.frame(Y1, Y2, Y3, Y4, Y5, Y6)
rm(Y1, Y2, Y3, Y4, Y5, Y6)

pdf(file="fig_pdf_from_R.pdf", w=8, h=8, version="1.4",  
bg="transparent")
pairs(the.data)
dev.off()

postscript(file="fig_ps_from_R.ps", width=8, height=8, paper="special")
pairs(the.data)
dev.off()

bitmap(file = "fig_pdf_bitmapR.pdf", width=8, height=8, type =  
"pdfwrite")
pairs(the.data)
dev.off()

# on mac os x only
quartz(w=8, h=8)
pairs(the.data)
dev2bitmap(file="fig_dev2bitmap.pdf", width=8, height=8,  
type="pdfwrite")
quartz.save(file="fig_pdf_from_quartz.savev2.pdf", type="pdf")
# here I also manually saved the quartz graphics and called it  
"fig_pdf_from_quartz.menu.pdf"