[R-sig-Geo] Creating density heatmaps for geographical data

Wed Oct 13 16:30:17 CEST 2010

Dear list members,

Does anybody have suggestions for the best way of creating density heatmaps 
for geographical data? With ‘density heatmaps’ I’m not thinking of heatmaps 
as in R’s ‘heatmap’ function, but more along the lines of

http://www.heatmapapi.com/
or
http://blog.spatialkey.com/2010/02/comparing-thematic-maps-with-density-
heatmaps/

(Yes, basically, it’s just a 2D density estimate overlain a map.)

I *have* managed to create an OK heatmap using ‘sm.density’ to get a density 
estimate, converting it a format ‘levelplot’ understands, plotting it with 
‘levelplot’, with the ‘panel’ function set to include a ‘sp.polygon’ 
function to first draw a simple map before drawing the actual density with 
‘panel.levelplot’.

I’m not entirely happy with this approach, though. First note that the main 
reason I use ‘sm.density’ instead of ‘kde2d’ is that ‘sm.density’ accepts a 
‘weights’ arguments. Since each of my observations contains frequency/weight 
information, I need this, and ‘sm.density’ seems to handle it well.

However, since the ‘heatmap’ is really a density estimate, it integrates to 
1. I would instead want the heatmap colours to correspond to real 
frequencies (of course, they do, but the actual mapping is not visible on 
the colour scale). It would also be nice if I could set the range used for 
the colour scale directly, so that maps created at different times would be 
comparable in terms of their visual appearance (e.g., dark red would always 
correspond to a value of 1000). Right now I use the bpy.colors palette, but 
other suggestions are very welcome.

Another problem is that ‘sm.density’ uses a Gaussian kernel. This means that 
my *entire* map is coloured, even areas far from any observations (and of 
course overlapping the background map). My solution to this is to change 
small values of the density estimate to NA values, but selecting the right 
threshold is a rather manual procedure, and a value that works for one 
density map may not work for a map for a different place, or at a different 
time. The best solution would be a function that used a kernel with finite 
support, so that any areas with a distance greater than the bandwidth from 
the nearest observation would be transparent (have NA values).

Also, any special issues I should be aware of since this is *geographical* 
data? I’m planning to use a UTM projection to avoid the problem with the 
length of longitudes varying with the latitude. My data cover a rather large 
area, so this is not a perfect solution, but should be good enough. (But if 
there are any functions that can calculate heatmaps taking into account the 
shape of the Earth, this would of course great).

So, any suggestions or experience with this type of plotting problem? 
Perhaps I’m missing something obvious? :)

-- 
Regards,
Karl Ove Hufthammer