[R] Scatterplot : smoothing colors according to density of points
Benjamin Dubreuil
benjamin.dubreuil at weizmann.ac.il
Tue Jun 2 12:37:04 CEST 2015
Hello everyone,
I have a data frame D with 4 columns id,X,Y,C.
I want to plot a simple scatter plot of D$X vs. D$Y and using D$C values as a color. (id is just a text string not used for the plot)
But actually, I don't want to use the raw values of D$C, I would prefer to calculate the average values of D$C according to the density of points in a fixed neighborhood.
In other words, I would like to smooth the colors according to the density of points.
I am looking for any function,package that could solve this.
So far, I've been looking at library MASS and the function kde2d which can calculate the density of points in 2 directions, but I don't see how I could then use this information to recalculate my D$C values.
Here is a piece of the matrix :
> head(D)
id X Y C
1 O13297 44.444444 21.61220 -0.136651639
2 O13329 31.272085 4.01590 -0.117016949
3 O13525 6.865672 2.43884 -0.161173913
4 O13539 14.176245 7.81217 -0.075756757
5 O13541 73.275862 3.59012 -0.006988235
6 O13547 28.991597 258.99900 -0.013985507
> dim(D)
[1] 3616 4
> apply(D[,-1],2,range)
X Y C
[1,] 0.3378378 0.0003 -0.7382222
[2,] 100.0000000 24556.4000 0.5582500
(Y is not linear, so I use log='y' in the plot function)
I used a palette of 100 colors ranging from Blue to Yellow to red.
>pal = colorRampPalette(c("blue","yellow","red"))(100)
To make D$C values correspond to a color, I used a cut with the following breaks (101 breaks from -1.2 to 1.2):
> BREAKS
[1] -1.2000 -0.8000 -0.4000 -0.3600 -0.3200 -0.2800 -0.2400 -0.2000 -0.1925
[10] -0.1850 -0.1775 -0.1700 -0.1625 -0.1550 -0.1475 -0.1400 -0.1368 -0.1336
[19] -0.1304 -0.1272 -0.1240 -0.1208 -0.1176 -0.1144 -0.1112 -0.1080 -0.1048
[28] -0.1016 -0.0984 -0.0952 -0.0920 -0.0888 -0.0856 -0.0824 -0.0792 -0.0760
[37] -0.0728 -0.0696 -0.0664 -0.0632 -0.0600 -0.0568 -0.0536 -0.0504 -0.0472
[46] -0.0440 -0.0408 -0.0376 -0.0344 -0.0312 -0.0280 -0.0248 -0.0216 -0.0184
[55] -0.0152 -0.0120 -0.0088 -0.0056 -0.0024 0.0008 0.0040 0.0072 0.0104
[64] 0.0136 0.0168 0.0200 0.0232 0.0264 0.0296 0.0328 0.0360 0.0392
[73] 0.0424 0.0456 0.0488 0.0520 0.0552 0.0584 0.0616 0.0648 0.0680
[82] 0.0712 0.0744 0.0776 0.0808 0.0840 0.0872 0.0904 0.0936 0.0968
[91] 0.1000 0.1250 0.1500 0.1750 0.2000 0.2250 0.2500 0.4875 0.7250
[100] 0.9625 1.2000
> C.levels = as.numeric(cut(D$C,breaks=BREAKS))
>length(C.levels)
[1] 3616
C.levels ranges from 2 to 98 and then to plot the colors I used pal[C.levels].
> plot( x=D$x, y=D$Y, col=pal[ C.levels ],log='y')
[[alternative HTML version deleted]]
More information about the R-help
mailing list