[R] scatterplot of 100000 points and pdf file format

Liaw, Andy andy_liaw at merck.com
Wed Nov 24 17:37:29 CET 2004


I have no experience with it, but I believe the hexbin package in BioC was
there for this purpose: avoid heavy over-plotting lots of points.  You might
want to look into that, if you have not done so yet.


> From: Marc Schwartz
> On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
> > Hi,
> > 
> > I want to draw a scatter plot with 1M  and more points and 
> save it as pdf.
> > This makes the pdf file large.
> > So i tried to save the file first as png and than convert 
> it to pdf. 
> > This looks OK if printed but if viewed e.g. with acrobat as 
> document 
> > figure the quality is bad.
> > 
> > Anyone knows a way to reduce the size but keep the quality?
> Hi Eryk!
> Part of the problem is that in a pdf file, the vector based 
> instructions
> will need to be defined for each of your 10 ^ 6 points in 
> order to draw
> them.
> When trying to create a simple example:
> pdf()
> plot(rnorm(1000000), rnorm(1000000))
> dev.off()
> The pdf file is 55 Mb in size.
> One immediate thought was to try a ps file and using the 
> above plot, the
> ps file was "only" 23 Mb in size. So note that ps can be more 
> efficient.
> Going to a bitmap might result in a much smaller file, but as 
> you note,
> the quality does degrade as compared to a vector based image.
> I tried the above to a png, then converted to a pdf (using 'convert')
> and as expected, the image both viewed and printed was "pixelated",
> since the pdf instructions are presumably drawing pixels and 
> not vector
> based objects.
> Depending upon what you plan to do with the image, you may have to
> choose among several options, resulting in tradeoffs between image
> quality and file size.
> If you can create the bitmap file explicitly in the size that you
> require for printing or incorporating in a document, that is 
> one way to
> go and will preserve, to an extent, the overall fixed size image
> quality, while keeping file size small.
> Another option to consider for the pdf approach, if it does not
> compromise the integrity of your plot, is to remove any duplicate data
> points if any exist. Thus, you will not need what are in effect
> redundant instructions in the pdf file. This may not be possible
> depending upon the nature of your data (ie. doubles) without 
> considering
> some tolerance level for "equivalence".
> Perhaps others will have additional ideas.
> HTH,
> Marc Schwartz
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

More information about the R-help mailing list