[R] PDF too large, PNG bad quality

Greg Snow Greg.Snow at imail.org
Thu Oct 22 23:09:43 CEST 2009

For getting the details in the outer points, here is what I do.

1. use hexbin to create the big central blob (but with additional info).  
2. use the chull function to find the outer points and save their indices in another vector
3. use chull on the rest of the points (excluding those found previously) and append their indices to the previous ones found.
4. repeat step 3 until have about 100-250 outer points (while loop works nicely)
5. use the points function to add just the outer points found above to the plot.

This gives a plot with the color/shade representing the density where the most points are, but also shows the individual points out on the edges, the only thing that is missed are possibly interesting points laying between peaks.

Hope this helps,

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org

> -----Original Message-----
> From: b.rowlingson at googlemail.com [mailto:b.rowlingson at googlemail.com]
> On Behalf Of Barry Rowlingson
> Sent: Thursday, October 22, 2009 2:43 PM
> To: Greg Snow
> Cc: Lasse Kliemann; r-help at r-project.org
> Subject: Re: [R] PDF too large, PNG bad quality
> On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow <Greg.Snow at imail.org> wrote:
> > The problem with the pdf files is that they are storing the
> information for every one of your points, even the ones that are
> overplotted by other points.  The png file is smaller because it only
> stores information on which color each pixel should be, not how many
> points contributed to a particular pixel being a given color.  But then
> png files convert the text to pixel information as well which don't
> look good if there is post scaling.
> >
> > If you want to go the pdf route, then you need to find some way to
> reduce redundant information while still getting the main points of the
> plot.  With so many point, I would suggest looking at the hexbin
> package (bioconductor I think) as one approach, it will not be an
> identical scatterplot, but will convey the information (possibly
> better) with much smaller graphics file sizes.  There are other tools
> like sunflower plots or others, but hexbin has worked well for me.
> >
>  I've seen this kind of thing happen after waiting an hour for one of
> my printouts when queued after something submitted by one of our
> extreme value stats people. I've seen them make plots containing maybe
> a million points, most of which are in a big black blob, but they want
> to be able to show the important sixty or so points at the extremes.
>  I'm not sure what the best way to print this kind of thing is - if
> they know where the big blob is going to be then they could apply some
> cutoff to the plot and only show points outside the cutoff, and fill
> the region inside the cutoff with a black polygon...
>  Another idea may be to do a high resolution plot as a PNG (think 300
> pixels per inch of your desired final output) but do it without text
> and add that on later in a graphics package.
> Barry

More information about the R-help mailing list