[R-sig-Geo] Sampling from a SAR error model

Sat Sep 16 17:48:46 CEST 2006

Roger and Terry,

Thanks to both of you for your reply. Yes, sigma^2 enters into the
generation of e. The systematic component of the model,
X_mat%*%parms, has a variance of around 10^2.  e is typically
around 4^2, or...

e <- rnorm(n,mean=0,sd=4)

I went ahead and copied the code below (it isn't too long) - in case
others are interested in this exersise.  The car package is only
necessary for the n.bins function in the last lines of the code. So
it is optional.  I also tried the invIrM function suggested by
Terry.  It seems to produce the same results that I get.  It also
appears in the code below. Any help/suggestions would be greatly
appreciated.  I am stuck.

#creating data

library(spdep)
library(car)

#sample size
n <- 100

#coordinates

x_coord <- runif(n,0,10)
y_coord <- runif(n,0,10)

## w matrix and row normalized

w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)

for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 +
(y_coord[i]-y_coord[j])^2))}}

w_dummy <-  matrix(nrow=length(x_coord),ncol=length(x_coord),1)

#defines neighborhood distance

neighbor_dist <- 3

for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}}

for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{if( i == j ) w_dummy[i,j] <- 0}}

row_sum <- rep(1,length(x_coord))
for(i in 1:length(x_coord))
	{row_sum[i] <- sum(w_dummy[i,]) }

w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)

# divide by row sums
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
	{w[i,j] <- w_dummy[i,j]/row_sum[i]}}

neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist)
nb_list <- nb2listw(neighbors)

#create some independent variables
X1 <- rnorm(n,4,2)
X2 <- rnorm(n,2,2)
X3 <- rnorm(n,0,1)
X4 <- rnorm(n,13,3)
X5 <- rnorm(n,21,.5)
X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5)

# population parameters
parms <- c(12.4,2,-4,5,1,-.5)
rho <- .3

#begin iterations
n_iter <- 100  #draw 1500 samples of size n

estimates <- matrix(nrow=n_iter,ncol=7)
estimates2 <- matrix(nrow=n_iter,ncol=7)
estimates3 <- matrix(nrow=n_iter,ncol=7)

for(iter in 1:n_iter)
	{

e <- rnorm(n,mean=0,sd=4)

# drawing samples
y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms +
solve(diag(100)-rho*w)%*%e
y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e
y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e

lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5,
listw=nb_list,type='lag')
lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5,
listw=nb_list)
lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5,
listw=nb_list)

# retrieve parameter estimates

estimates[iter,] <- coefficients(lag_model)
estimates2[iter,] <- coefficients(lag_model2)
estimates3[iter,] <- coefficients(lag_model3)
}

mean(estimates[,7])
hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE)
lines(density(estimates[,7]),lwd=2)

mean(estimates2[,7])
hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE)
lines(density(estimates2[,7]),lwd=2)

mean(estimates3[,7])
hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE)
lines(density(estimates3[,7]),lwd=2)

#data <- data.frame(cbind(y_lag, y_error,
X1,X2,X3,X4,X5,x_coord,y_coord))
#names(data) <- c("y_lag","y_error",names(data)[3:9])
#write.table(data,file="C:/test.txt")

Quoting Roger Bivand <Roger.Bivand at nhh.no>:

> On Fri, 15 Sep 2006, Sam Field wrote:
>
> > List,
> >
> > I am having trouble writing a script that samples from an SAR
> error
> > process. I've done it successfully for a spatial lag model, but
> not
> > for the spatial error model.  For the spatial lag model, I have
> the
> > following:
> >
> > y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms +
> > solve(diag(100)-p*w)%*%e
> >
> > where parms is a parameter vector
> > X_mat is n by p matrix of independent variables (+ constant)
> > e is a vector of indepndent normal deviates (mean = 0)
> > p is the autoregressive paramter
> > and w is a square, n by n contiguity matrix (row normalized).
> >
> > This works beautifully.  lagsarlm recovers parms and p without
> a
> > problem. Over repeated sampling, the estimated values are
> centered
> > on the value for p in the simulation.
> >
> >
> > Is there something wrong with the following for the spatial
> error
> > model?
> >
> > y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e
> >
>
> Interesting question. Where is the sigma^2 going in your case -
> is it in
> the generation of e? Could you try to use the columbus dataset
> and
> set.seed to generate a reproducible example - it may be that what
> you have
> written does not communicate exactly what you have done?
>
> Roger
>
> >
> > The distribution of values for p obtain from errorsarlm over
> > repeated sampling are not centered around the value for the
> > simulation, but are typically much lower and all over the
> place.  I
> > have only looked at values for p ranging from .3 to .7.
> >
> > any help would be greatly appreciated!
> >
> >
> >
> >
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian
> School of
> Economics and Business Administration, Helleveien 30, N-5045
> Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>

-- 
Samuel H. Field, Ph.D.
Associate Research Scholar
Institute for Social and Economic Research and Policy (ISERP)
Columbia University
420 W. 118th Street
Mail code 3355
New York, NY  10027
215-731-0106