[R] scatterplot of 100000 points and pdf file format
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Nov 24 17:50:13 CET 2004
On Wed, 24 Nov 2004 Ted.Harding at nessie.mcc.ac.uk wrote:
> On 24-Nov-04 Witold Eryk Wolski wrote:
>> Hi,
>> I want to draw a scatter plot with 1M and more points
>> and save it as pdf.
>> This makes the pdf file large.
>> So i tried to save the file first as png and than convert
>> it to pdf. This looks OK if printed but if viewed e.g. with
>> acrobat as document figure the quality is bad.
>>
>> Anyone knows a way to reduce the size but keep the quality?
>
> If you want the PDF file to preserve the info about all the
> 1M points then the problem has no solution. The png file
> will already have suppressed most of this (which is one
> reason for poor quality).
>
> I think you should give thought to reducing what you need
> to plot.
>
> Think about it: suppose you plot with a resolution of
> 1/200 points per inch (about the limit at which the eye
> begins to see rough edges). Then you have 40000 points
> per square inch. If your 1M points are separate but as
> closely packed as possible, this requires 25 square inches,
> or a 5x5 inch (= 12.7x12.7 cm) square. And this would be
> solid black!
>
> Presumably in your plot there is a very large number of
> points which are effectively indistinguisable from other
> points, so these could be eliminated without spoiling
> the plot.
>
> I don't have an obviously best strategy for reducing what
> you actually plot, but perhaps one line to think along
> might be the following:
>
> 1. Multiply the data by some factor and then round the
> results to an integer (to avoid problems in step 2).
> Factor chosen so that the result of (4) below is
> satisfactory.
>
> 2. Eliminate duplicates in the result of (1).
>
> 3. Divide by the factor you used in (1).
>
> 4. Plot the result; save plot to PDF.
>
> As to how to do it in R: the critical step is (2),
> which with so many points could be very heavy unless
> done by a well-chosen procedure. I'm not expert enough
> to advise about that, but no doubt others are.
unique will eat that for breakfast
> x <- runif(1e6)
> system.time(xx <- unique(round(x, 4)))
[1] 0.55 0.09 0.64 0.00 0.00
> length(xx)
[1] 10001
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list