[R-sig-Geo] Simulating spatially autocorrelated data

Wed Sep 7 18:13:41 CEST 2011

Patrick,

Specification of the spatial weights matrix (W) is important, and, in general, the connectedness of the W influences the estimation and inference of the model. When you say that you do not know the "true rho", I suspect you are saying that you do not know the true underlying spatial structure of the data, and thus the appropriate specification of the spatial weights matrix. One tool in the spdep package that may be helpful to you is the sp.correlogram function for spatial correlogram; other techniques have been used including semivariograms. I would be interested in what others have to say regarding determining the optimal level of connectedness of W.

Two classic references regarding connectedness of W are:

Florax, R.J.G.M. and Rey, S. 1995 The Impacts of Misspecified Spatial Interaction in Linear Regression Models. In Anselin L, Florax R J G M (eds) New directions in spatial econometrics. Berlin, Springer: 111-135

Bell, K.P. and Bockstael, N.E. 2000. Applying the Generalized-Moment Estimation Approach to Spatial Problems Involving Micro level Data. The Review of Economics and Statistics, February 2000, 82 (1): 72-82.

Terry Griffin, Ph.D. 
Associate Professor - Economics 
University of Arkansas - Division of Agriculture 
501.249.6360 (SMS)
tgriffin at uaex.edu 

----- Original Message -----
From: "Patrick Downey" <PDowney at urban.org>
To: "Roger Bivand" <Roger.Bivand at nhh.no>
Cc: r-sig-geo at stat.math.ethz.ch
Sent: Tuesday, September 6, 2011 1:02:04 PM
Subject: Re: [R-sig-Geo] Simulating spatially autocorrelated data

Hi Roger and Terry,

Thank you very much for your help and directing me towards Roger's spdep
package, which of course had everything I needed. I've now worked through
this code and done some additional simulations.

I have one remaining question. You say "the larger the distance threshold,
the less well the spatial process is captured." I was wondering if you
could further provide some information on this, either by explaining or
referencing a document or webpage with explanation.

Decreasing the distance threshold, as you suggest, radically alters the
results and I'm looking for some guidance on how to select the appropriate
distance threshold when I don't know the true rho (that is, with
non-simulated data).

Thanks,
Mitch 

-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand at nhh.no] 
Sent: Thursday, September 01, 2011 2:20 PM
To: Downey, Patrick
Cc: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] Simulating spatially autocorrelated data

On Thu, 1 Sep 2011, Downey, Patrick wrote:

> Hello all,
>
> I'm trying to simulate a spatially autocorrelated random variable, and 
> I cannot figure out what the problem is. All I want is a simple 
> spatial lag model where
>
> Y = rho*W*Y + e
>
> Where e is a vector of iid normal random variables, rho is the 
> autocorrelation, W is a row-normalized distance matrix (a spatial 
> weights matrix), and Y is the random variable.
>
> I thought the following program should do it, but it's not working. At 
> the end of the program, I calculate Moran's I, and it is not even 
> close to rejecting the null hypothesis of no spatial autocorrelation, 
> even when rho is very high (for example, below, rho is 0.95). Can 
> someone please identify what the problem is and offer some guidance on
how to fix it?
>
> PS - I apologize in advance, but I am not familiar with R's spatial 
> packages. I've done very little spatial analysis in R, so if there's a 
> package that can already do this, please recommend.
>
> BEGIN PROGRAM:
>
> install.packages("fields");library(fields)
> install.packages("ape");library(ape)
>
> N <- 200
> rho <- 0.95
>
> x.coord <- runif(N,0,100)
> y.coord <- runif(N,0,100)
>
> points <- cbind(x.coord,y.coord)
>
> e <- rnorm(N,0,1)
>
> dist.nonnorm <- rdist(points,points)   # Matrix of Euclidean distances
> dist <- dist.nonnorm/rowSums(dist.nonnorm)   # Row normalizing the
distance
> matrix
> diag(dist) <- 0   # Ensuring that the main diagonal is exactly 0

I think that you are using the distances as weights, not inverse distances,
which seems more sensible.

>
> I <- diag(N)   # Identity matrix (not Moran's I)
>
> inv <- solve(I-rho.lag*dist)   # Inverting (I - rho*W)
> y <- as.vector(inv %*% e)   # Generating data that is supposed to be
> spatially autocorrelated
>
> Moran.I(y,dist)   # Does not reject null hypothesis of no spatial
> autocorrelation
>

As Terry Griffin says, you can use spdep for this:

library(spdep)
rho <- 0.95
N <- 200
x.coord <- runif(N,0,100)
y.coord <- runif(N,0,100)
points <- cbind(x.coord,y.coord)
e <- rnorm(N,0,1)
dnb <- dnearneigh(points, 0, 150)
dsts <- nbdists(dnb, points)
idw <- lapply(dsts, function(x) 1/x)
lw <- nb2listw(dnb, glist=idw, style="W") inv <- invIrW(lw, rho) y <- inv
%*% e moran.test(y, lw)

to reproduce your analysis with IDW, here without:

lw <- nb2listw(dnb, glist=dsts, style="W") inv <- invIrW(lw, rho) y <- inv
%*% e moran.test(y, lw) # no autocorrelation

and here with a less inclusive distance threshold:

dnb <- dnearneigh(points, 0, 15)
dsts <- nbdists(dnb, points)
idw <- lapply(dsts, function(x) 1/x)
lw <- nb2listw(dnb, glist=idw, style="W") inv <- invIrW(lw, rho) y <- inv
%*% e moran.test(y, lw)

the larger the distance threshold, the less well the spatial process is
captured, alternatively use idw <- lapply(dsts, function(x) 1/(x^2)), for
example, to attenuate the weights more sharply.

Hope this clarifies,

Roger

> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics, Helleveien 30,
N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo