[R] Scatterplot Showing All Points

Duncan Murdoch murdoch at stats.uwo.ca
Tue Dec 18 19:05:03 CET 2007


On 12/18/2007 12:44 PM, Antony Unwin wrote:
> On 18 Dec 2007, at 4:49 pm, Duncan Murdoch wrote:
> 
>>> One good alternative here is the fluctuation diagram  variant of a  
>>> mosaic plot:
>>> xx<-as.factor(x)
>>> yy<-as.factor(y)
>>> imosaic(xx,yy, type="f")
>>
>> That plot is better than jittering, but there's the problem in the  
>> mosaic plot of understanding the scale of the rectangles:  is it  
>> area or diameter that encodes the count?
> 
> Area is used.
> 
>> With a jittered plot, you lose resolution when the number of points  
>> gets too high because you just see a mess of ink, but at least you  
>> only require the viewer to count in order to get a close numerical  
>> reading from the plot.
> 
> If someone needs a count, they should be given a table.   Graphics  
> are for qualitative conclusions not details.  Anyway, counting will  
> only work for really small datasets.
> 
>> I could also claim that while imperfect, at least jittering is  
>> widely applicable.  For example, if the data were not on a regular  
>> grid, perhaps because they had been generated like this:
>>
>> xloc <- rnorm(50)
>> yloc <- rnorm(50)
>> index <- sample(1:50, 5000, rep=TRUE, prob = abs(xloc))
>> x <- xloc[index]
>> y <- yloc[index]
>>
>> then jittering still works as well (or as poorly), but the imosaic  
>> would not work at all.
> 
> That's right and that's (almost) the sort of example I was thinking  
> of.  For a limited number of locations like this a bubble plot would  
> be best (which has already been suggested in this thread, I think).   
> For many locations and few replications I would still go for varying  
> pointsize and transparency.
> 
> Incidentally, to check your suggestion I ran your code and discovered  
> that the transparency in iplot does not seem to like replications.   
> Very strange, we'll have to check why.  I then looked closely at the  
> numbers of replications generated and discovered that case 25 was  
> picked 325 times and case 40 only once.  Rather too extreme for my  
> liking!  Running it again gave very similar results, though not  
> exactly the same: this time it was 325 times for case 25 and case 40  
> was not picked at all.  Other numbers varied slightly.  This is not  
> what I expected, any ideas?

abs(xloc) typically varies by a factor of about 100 from smallest to 
largest, but sometimes the small end is really small, and so the ratio 
is really big.

Duncan Murdoch

> 
>> P.S. iplots 1.1-1 may have an init problem in Windows: in my first  
>> attempt, the plot made the boxes too large to fit in their cells,  
>> but it fixed itself when I resized the window, and the bug doesn't  
>> seem to be repeatable.
> 
> Thanks.  This happens occasionally on the Mac too.  Refreshing solves  
> it in practice, but we need to find out why it can happen (and stop  
> it happening!).
> 
> Antony Unwin
> Professor of Computer-Oriented Statistics and Data Analysis,
> University of Augsburg,
> Germany



More information about the R-help mailing list