[R] SPDEP Package, neighbours list for Moran's I on large grid dataset
rhelp at sticksoftware.com
Wed Apr 20 15:05:20 CEST 2011
Laurent Jégou wrote:
> Hello list members, i'd like to calculate the Moran I on a large dataset,
> a 8640x3432 grid of values.
> When i try to create the neighbours list with the cell2nb function, on
> such a scale, R works for several hours and seems to crash. I'm using the last
> version (2.13), 64 bits, on a mac pro with 4go of memory.
> I think that i'm doing it wrong :-)
> I'd like to use a "moving window" weight matrix instead of a full scale
> one, but i was not lucky enough to find a solution in the docs.
I confronted this problem recently, and decided to roll my own. For a large dataset, an exhaustive computation of Moran's I is very time-consuming, as you say; but an estimate of it, based on a subset of pairs chosen randomly, seemed (for my data, at least) to converge quite nicely. Here's my code. It assumes a square grid, so you'll need to adapt it for your non-square grid, but that should be trivial. To determine the right number of pairs for a good estimate in my application, I called this function with various numbers of pairs, and plotted the estimate produced as a function of the number of pairs used, and observed good convergence by 200,000 pairs. You will probably want to do this yourself, for your dataset. 200,000 pairs took just a second or two to run, on my machine, so this is much, much faster than an exhaustive calculation. As a bonus, this code also gives you Geary's C. Oh, and the code below assumes a wraparound lattice (i.e. a torus, i.e. periodic boundaries), which may not be what you want; just get rid of the d_wraparound stuff and the following pmax() call if you want non-wraparound boundaries, and that should work fine, I think. Not sure what you mean by the moving window thing, so I leave that as an exercise for the reader. :->
I've never made an R package, and I'm not sure there's much demand for this code (is there?), so I'm presently unlikely to package it up. However, if someone who owns a package related to this problem would like to adopt this code, generalize it to non-square lattices, add a flag to choose periodic or non-periodic boundaries, etc., feel free to do so; I hereby place it in the public domain. I'd just like to hear about it if someone does this, and receive credit somewhere, that's all. I'm not super-good in R, either, so if there's a better way to do some things, like the somewhat hacked random index generation (trying to avoid floating point issues when generating a random integer), please let me know. I'm always interested in learning how to do things better.
And of course this code is provided without warranty, may have bugs, etc.
correlation_stats <- function(p, n_pairs=200000)
# Compute Moran's I and Geary's C for the landscape p. This is tricky because the exact calculation
# would be far too time-consuming, as it involves comparison of every point to every other point.
# So I am trying to estimate it here with a small subset of all possible point comparisons.
p_size <- NROW(p) # 512
p_length <- length(p) # 262144
mean_p <- mean(p)
pv <- as.vector(p)
# select points and look up their values; for speed, points are selected by their vector index
p1ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
p1x <- (p1ix %/% p_size) + 1
p1y <- (p1ix %% p_size) + 1
p1ix <- p1ix + 1
p2ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
p2x <- (p2ix %/% p_size) + 1
p2y <- (p2ix %% p_size) + 1
p2ix <- p2ix + 1
v1 <- pv[p1ix] - mean_p
v2 <- pv[p2ix] - mean_p
# Calculate distances between point pairs, both directly and wrapping around
# The average direct distance is much smaller than the average wraparound distance,
# because points near the center vertically have a maximum direct distance near 256,
# but a minimum wraparound distance near 256. The Moran's I estimate from wraparound
# distances is different, as a result. Rupert recommends taking the shorter of the
# two distances, whichever it is, because that keeps us within the meaningful scale
# of our space, before it just starts repeating due to periodicity.
d_direct <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p1y - p2y) ^ 2))
d_direct[is.infinite(d_direct)] <- 0
d_wraparound <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p_size - abs(p1y - p2y)) ^ 2))
d_wraparound[is.infinite(d_wraparound)] <- 0
d <- pmax(d_direct, d_wraparound) # max because we want the min distance, and these are 1/distance
# precalculate some shared terms
sum_d <- sum(d)
sum_v1_sq <- sum(v1 ^ 2)
# Moran's I: -1 is perfect dispersion, 1 is perfect correlation, 0 is a random spatial pattern
M_I <- (n_pairs / sum_d) * (sum(d * v1 * v2) / ((sum_v1_sq + sum(v2 ^ 2)) / 2))
# (n_pairs / sum(d_direct)) * (sum(d_direct * v1 * v2) / ((sum(v1 ^ 2) + sum(v2 ^ 2)) / 2))
# (n_pairs / sum(d_wraparound)) * (sum(d_wraparound * v1 * v2) / ((sum(v1 ^ 2) + sum(v2 ^ 2)) / 2))
# Geary's C: 0 is positive spatial autocorrelation, 2 is negative spatial autocorrelation, 1 is a random spatial pattern
G_C <- ((n_pairs - 1) * sum(d * ((v1 - v2) ^ 2))) / (2 * sum_d * sum_v1_sq)
More information about the R-help