[R] Scatterplot : smoothing colors according to density of points

Benjamin Dubreuil benjamin.dubreuil at weizmann.ac.il
Tue Jun 2 12:37:04 CEST 2015


Hello everyone,

I have a data frame D with 4 columns id,X,Y,C.
I want to plot a simple scatter plot of D$X vs. D$Y and using D$C values as a color. (id is just a text string not used for the plot)

But actually, I don't want to use the raw values of D$C, I would prefer to calculate the average values of D$C according to the density of points in a fixed neighborhood.
In other words, I would like to smooth the colors according to the density of points.

I am looking for any function,package that could solve this.
So far, I've been looking at library MASS and the function kde2d which can calculate the density of points in 2 directions, but I don't see how I could then use this information to recalculate my D$C values.

Here is a piece of the matrix :
 > head(D)
      id         X         Y            C
1 O13297 44.444444  21.61220 -0.136651639
2 O13329 31.272085   4.01590 -0.117016949
3 O13525  6.865672   2.43884 -0.161173913
4 O13539 14.176245   7.81217 -0.075756757
5 O13541 73.275862   3.59012 -0.006988235
6 O13547 28.991597 258.99900 -0.013985507

> dim(D)
[1] 3616    4

> apply(D[,-1],2,range)
               X          Y          C
[1,]   0.3378378     0.0003 -0.7382222
[2,] 100.0000000 24556.4000  0.5582500
(Y is not linear, so I use log='y' in the plot function)

I used a palette of 100 colors ranging from Blue to Yellow to red.
>pal =  colorRampPalette(c("blue","yellow","red"))(100)

To make D$C values correspond to a color, I used a cut with the following breaks (101 breaks from -1.2 to 1.2):
> BREAKS
  [1] -1.2000 -0.8000 -0.4000 -0.3600 -0.3200 -0.2800 -0.2400 -0.2000 -0.1925
 [10] -0.1850 -0.1775 -0.1700 -0.1625 -0.1550 -0.1475 -0.1400 -0.1368 -0.1336
 [19] -0.1304 -0.1272 -0.1240 -0.1208 -0.1176 -0.1144 -0.1112 -0.1080 -0.1048
 [28] -0.1016 -0.0984 -0.0952 -0.0920 -0.0888 -0.0856 -0.0824 -0.0792 -0.0760
 [37] -0.0728 -0.0696 -0.0664 -0.0632 -0.0600 -0.0568 -0.0536 -0.0504 -0.0472
 [46] -0.0440 -0.0408 -0.0376 -0.0344 -0.0312 -0.0280 -0.0248 -0.0216 -0.0184
 [55] -0.0152 -0.0120 -0.0088 -0.0056 -0.0024  0.0008  0.0040  0.0072  0.0104
 [64]  0.0136  0.0168  0.0200  0.0232  0.0264  0.0296  0.0328  0.0360  0.0392
 [73]  0.0424  0.0456  0.0488  0.0520  0.0552  0.0584  0.0616  0.0648  0.0680
 [82]  0.0712  0.0744  0.0776  0.0808  0.0840  0.0872  0.0904  0.0936  0.0968
 [91]  0.1000  0.1250  0.1500  0.1750  0.2000  0.2250  0.2500  0.4875  0.7250
[100]  0.9625  1.2000
> C.levels = as.numeric(cut(D$C,breaks=BREAKS))
>length(C.levels)
[1] 3616

C.levels ranges from 2 to 98 and then to plot the colors I used pal[C.levels].
> plot( x=D$x, y=D$Y, col=pal[ C.levels ],log='y')



	[[alternative HTML version deleted]]



More information about the R-help mailing list