[R] scatterplot of 100000 points and pdf file format
Witold Eryk Wolski
wolski at molgen.mpg.de
Thu Nov 25 09:33:46 CET 2004
Prof Brian Ripley wrote:
> On Wed, 24 Nov 2004 Ted.Harding at nessie.mcc.ac.uk wrote:
>> On 24-Nov-04 Witold Eryk Wolski wrote:
>>> I want to draw a scatter plot with 1M and more points
>>> and save it as pdf.
>>> This makes the pdf file large.
>>> So i tried to save the file first as png and than convert
>>> it to pdf. This looks OK if printed but if viewed e.g. with
>>> acrobat as document figure the quality is bad.
>>> Anyone knows a way to reduce the size but keep the quality?
>> If you want the PDF file to preserve the info about all the
>> 1M points then the problem has no solution. The png file
>> will already have suppressed most of this (which is one
>> reason for poor quality).
>> I think you should give thought to reducing what you need
>> to plot.
>> Think about it: suppose you plot with a resolution of
>> 1/200 points per inch (about the limit at which the eye
>> begins to see rough edges). Then you have 40000 points
>> per square inch. If your 1M points are separate but as
>> closely packed as possible, this requires 25 square inches,
>> or a 5x5 inch (= 12.7x12.7 cm) square. And this would be
>> solid black!
>> Presumably in your plot there is a very large number of
>> points which are effectively indistinguisable from other
>> points, so these could be eliminated without spoiling
>> the plot.
>> I don't have an obviously best strategy for reducing what
>> you actually plot, but perhaps one line to think along
>> might be the following:
>> 1. Multiply the data by some factor and then round the
>> results to an integer (to avoid problems in step 2).
>> Factor chosen so that the result of (4) below is
>> 2. Eliminate duplicates in the result of (1).
>> 3. Divide by the factor you used in (1).
>> 4. Plot the result; save plot to PDF.
>> As to how to do it in R: the critical step is (2),
>> which with so many points could be very heavy unless
>> done by a well-chosen procedure. I'm not expert enough
>> to advise about that, but no doubt others are.
> unique will eat that for breakfast
>> x <- runif(1e6)
>> system.time(xx <- unique(round(x, 4)))
>  0.55 0.09 0.64 0.00 0.00
>  10001
?table -> reduces the data
?image -> shows it.
And this is doing exactly what I need. (not my idea but one of Thomas
UnternÃ¤her). Thanks Thomas.
Dipl. bio-chem. Witold Eryk Wolski
Ihnestrasse 63-73 14195 Berlin
tel: 0049-30-83875219 __("< _
http://www.molgen.mpg.de/~wolski \__/ 'v'
http://r4proteomics.sourceforge.net || / \
mail: witek96 at users.sourceforge.net ^^ m m
wolski at molgen.mpg.de
More information about the R-help