[R-sig-Geo] Creating density heatmaps for geographical data

jeremy.raw at dot.gov jeremy.raw at dot.gov
Tue Oct 19 15:52:53 CEST 2010


I'll offer my two cents as someone who has been using spatstat a lot recently to do spatial point process analysis.

I think it's an entirely useful when organizing spatial analysis libraries to look at the sp package as a master package for getting spatial data in and out of R and for performing basic metadata operations on such data (so one can simplify a lot by using OGR/GDAL or projection functionality through this interface).

Packages like spatstat (or raster, or whatever) might include some shortcuts to directly import common data from the world at large, but they are also armed with functions to convert in and out of sp data types (so the burden of wide-ranging input/output formats and geospatial projection is carried through sp).  The main need for data conversion functions in packages like spatstat is to take data in forms the user is likely to have and convert them into a structure that permits efficient application of the algorithms to which the package is dedicated (spatial point process, raster evaluation, etc.).  So I have nothing wrong with spatstat working with its own specialized types that can be built out of more general spatial data.  In fact, I even think it's rather nice to do that, because it helps one form an effective mental map of what you're doing and thus avoid some basic errors (e.g. trying to get meaningful spatial point process analyses when you haven't identified a window).

For novices trying to get their minds around what is going on with R spatial packages, that observation suggests the useful strategy of first trying to understand each package in its own terms (why was it created, what is it supposed to do, what functions does it provide, and what are the requirements of the data structures on which those functions operate), then trying to match one's own problem to the package functionality, and finally gluing the data and operations together with a minimal set of conversions.  Naturally, that's an iterative process in practice.  But that approach has helped me avoid the perilous shortcut of simply trying to apply function X from package P to my data D (which inevitably means that I impose my own fantasy of what X should do, and then bog down dealing with "bugs" that emerge directly from my own lack of understanding of how the problem, the tools, and the data properly fit together).

>From a package interface standpoint, I think a significant source of confusion for newcomers is the fact that you can do (for example) raster calculations on lots of different types of objects, using different functions, with different interfaces:  an im object in spatstat, for example, would use 'eval.im', but raster provides the 'calc' function, and sp lets you do basic data-frame computations transparently.  And of course, there's the fact that the standard plotting facilities for each of these packages provide different defaults and capabilities (which for me is the biggest headache -- I end up doing a lot of data conversions just so I don't have to remember all the alternative options and defaults).

This gets to Barry's point about spatstat (or almost any well-developed spatial package) providing a lot of functionality that supports the primary mission, but that also has more general applicability.  For users who specialize in spatial point process analysis, it's very nice to have a complete set of tools on hand in spatstat so you don't have to go hunting around.  But the pitfall is that if you add a lot of "extras", it encourages users to expect that the "perilous shortcut" I described above is available to them (that they can just throw random data at a function they only half understand and magically get the answer they are looking for).

In the long run, it would probably be helpful (perhaps in sp or some supporting package such as maptools or rgeos) to "establish" some generic functions for additional operations such as raster calculations that can be specialized in a package around a common interface so that common geospatial functions can easily be perceived as common.  But the example of what has happened with the spplot output functions in sp (specifically, that specializations are not widely available in other packages) is perhaps a cautionary tale about how such an approach is likely to work in practice.  I suspect this kind of anarchy reflects both the strength and weakness of distributed package development by unrelated teams:  each team does what they need (including supporting tools) with greater or lesser attention to what might be available in the larger environment or other packages.  Teams and users then end up working with whatever version of the overlapping functionality that they find easiest and most comprehensible.  That's good, because such "competition" and selection helps drive innovation as we all collectively contribute to figuring out what works best.  But it's also bad, especially for novices who end up writing strange mash-ups of data conversions, performing the same operation here and there with partially equivalent functions from several different packages, and struggling with general confusion about what's going on and why.

Notwithstanding the challenges, I am deeply grateful to all the R spatial developers who have put together this amazing set of useful resources...

Jeremy Raw, P.E., AICP
FHWA Office of Planning
jeremy.raw at dot.gov
(202) 366-0986


-----Original Message-----
From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of Barry Rowlingson
Sent: Tuesday, October 19, 2010 7:54 AM
To: Karl Ove Hufthammer
Cc: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] Creating density heatmaps for geographical data

On Tue, Oct 19, 2010 at 11:58 AM, Karl Ove Hufthammer <karl at huftis.org> wrote:

> And though the 'window' element of 'ppp' objects may be of use to some
> people, I haven't had any use for it. The annoying thing here is that the
> constructor doesn't generate the window automatically, based on the extent /
> bounding box of the data, and don't have an *option* for doing this, either.
> Whenever I have used 'spatstat' (not too often), I have had to spend too
> much time looking up how the window should be specified. Having [0,1] ×
> [0,1] as the *default* window, and excluding any points outside this does
> seems like a strange design decision.

 I had this 'argument' with Rolf and Adrian a few years ago during a
very nice stay in Perth with them. spatstat is not about (geo)spatial
data - it's about statistical point pattern analysis. A statistical
point pattern is only well-defined when there's a window. Otherwise it
aint a point pattern. And spatstat doesn't have any business with
non-spatial point patterns! :)

 There's a lot of functionality in spatstat that people want to use in
other contexts, such as some of the transformations or window
manipulation functions, and I think these could be usefully taken out
and put into a package that works with sp-class objects.

 But ppp objects are perfectly understandable and sensible if all you
do is point pattern analysis!

Barry

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list