[R-sig-Geo] Sampling from a SAR error model

Sat Oct 7 21:38:04 CEST 2006

I finally got back to my simulation work - sampling from a SAR
model.  I found that convergence on true rho and lambda depends on
characteristics of the spatial weight matrix in my example.  I had
been drawing a set of points from a uniform distribution and
defining binary weights (row standardized) based on a distance
criteria. If I set the distance criteria to high, resulting in a
dense weight matrix, then the autoregressive coefficient
(especially from the SAR model), tended to be underestimated.  Does
anybody in the list know, off the top of their heads, a good
reference on the impact of the specification of W on the estimation
of the autoregressive parameters from the spatial lag and spatial
error model?  I can't recall seeing anything.

thanks!

Sam

Quoting Sam Field <sf2257 at columbia.edu>:

> Roger and Terry,
>
> Thanks to both of you for your reply. Yes, sigma^2 enters into
> the
> generation of e. The systematic component of the model,
> X_mat%*%parms, has a variance of around 10^2.  e is typically
> around 4^2, or...
>
> e <- rnorm(n,mean=0,sd=4)
>
>
> I went ahead and copied the code below (it isn't too long) - in
> case
> others are interested in this exersise.  The car package is only
> necessary for the n.bins function in the last lines of the code.
> So
> it is optional.  I also tried the invIrM function suggested by
> Terry.  It seems to produce the same results that I get.  It also
> appears in the code below. Any help/suggestions would be greatly
> appreciated.  I am stuck.
>
>
>
> #creating data
>
> library(spdep)
> library(car)
>
>
> #sample size
> n <- 100
>
>
>
>
>
> #coordinates
>
> x_coord <- runif(n,0,10)
> y_coord <- runif(n,0,10)
>
>
> ## w matrix and row normalized
>
> w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
>
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 +
> (y_coord[i]-y_coord[j])^2))}}
>
>
> w_dummy <-  matrix(nrow=length(x_coord),ncol=length(x_coord),1)
>
>
> #defines neighborhood distance
>
> neighbor_dist <- 3
>
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}}
>
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {if( i == j ) w_dummy[i,j] <- 0}}
>
>
> row_sum <- rep(1,length(x_coord))
> for(i in 1:length(x_coord))
> 	{row_sum[i] <- sum(w_dummy[i,]) }
>
> w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
>
>
> # divide by row sums
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> 	{w[i,j] <- w_dummy[i,j]/row_sum[i]}}
>
>
>
> neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist)
> nb_list <- nb2listw(neighbors)
>
> #create some independent variables
> X1 <- rnorm(n,4,2)
> X2 <- rnorm(n,2,2)
> X3 <- rnorm(n,0,1)
> X4 <- rnorm(n,13,3)
> X5 <- rnorm(n,21,.5)
> X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5)
>
> # population parameters
> parms <- c(12.4,2,-4,5,1,-.5)
> rho <- .3
>
>
>
> #begin iterations
> n_iter <- 100  #draw 1500 samples of size n
>
> estimates <- matrix(nrow=n_iter,ncol=7)
> estimates2 <- matrix(nrow=n_iter,ncol=7)
> estimates3 <- matrix(nrow=n_iter,ncol=7)
>
>
> for(iter in 1:n_iter)
> 	{
>
>
>
> e <- rnorm(n,mean=0,sd=4)
>
>
> # drawing samples
> y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms +
> solve(diag(100)-rho*w)%*%e
> y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e
> y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e
>
>
> lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5,
> listw=nb_list,type='lag')
> lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5,
> listw=nb_list)
> lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5,
> listw=nb_list)
>
>
> # retrieve parameter estimates
>
> estimates[iter,] <- coefficients(lag_model)
> estimates2[iter,] <- coefficients(lag_model2)
> estimates3[iter,] <- coefficients(lag_model3)
> }
>
>
>
> mean(estimates[,7])
> hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE)
> lines(density(estimates[,7]),lwd=2)
>
> mean(estimates2[,7])
>
hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE)
> lines(density(estimates2[,7]),lwd=2)
>
> mean(estimates3[,7])
>
hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE)
> lines(density(estimates3[,7]),lwd=2)
>
>
> #data <- data.frame(cbind(y_lag, y_error,
> X1,X2,X3,X4,X5,x_coord,y_coord))
> #names(data) <- c("y_lag","y_error",names(data)[3:9])
> #write.table(data,file="C:/test.txt")
>
>
>
>
>
>
>
>
> Quoting Roger Bivand <Roger.Bivand at nhh.no>:
>
> > On Fri, 15 Sep 2006, Sam Field wrote:
> >
> > > List,
> > >
> > > I am having trouble writing a script that samples from an SAR
> > error
> > > process. I've done it successfully for a spatial lag model,
> but
> > not
> > > for the spatial error model.  For the spatial lag model, I
> have
> > the
> > > following:
> > >
> > > y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms +
> > > solve(diag(100)-p*w)%*%e
> > >
> > > where parms is a parameter vector
> > > X_mat is n by p matrix of independent variables (+ constant)
> > > e is a vector of indepndent normal deviates (mean = 0)
> > > p is the autoregressive paramter
> > > and w is a square, n by n contiguity matrix (row normalized).
> > >
> > > This works beautifully.  lagsarlm recovers parms and p
> without
> > a
> > > problem. Over repeated sampling, the estimated values are
> > centered
> > > on the value for p in the simulation.
> > >
> > >
> > > Is there something wrong with the following for the spatial
> > error
> > > model?
> > >
> > > y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e
> > >
> >
> > Interesting question. Where is the sigma^2 going in your case -
> > is it in
> > the generation of e? Could you try to use the columbus dataset
> > and
> > set.seed to generate a reproducible example - it may be that
> what
> > you have
> > written does not communicate exactly what you have done?
> >
> > Roger
> >
> > >
> > > The distribution of values for p obtain from errorsarlm over
> > > repeated sampling are not centered around the value for the
> > > simulation, but are typically much lower and all over the
> > place.  I
> > > have only looked at values for p ranging from .3 to .7.
> > >
> > > any help would be greatly appreciated!
> > >
> > >
> > >
> > >
> >
> > --
> > Roger Bivand
> > Economic Geography Section, Department of Economics, Norwegian
> > School of
> > Economics and Business Administration, Helleveien 30, N-5045
> > Bergen,
> > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > e-mail: Roger.Bivand at nhh.no
> >
> >
>
>
> --
> Samuel H. Field, Ph.D.
> Associate Research Scholar
> Institute for Social and Economic Research and Policy (ISERP)
> Columbia University
> 420 W. 118th Street
> Mail code 3355
> New York, NY  10027
> 215-731-0106
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Samuel H. Field, Ph.D.
Associate Research Scholar
Institute for Social and Economic Research and Policy (ISERP)
Columbia University
420 W. 118th Street
Mail code 3355
New York, NY  10027
215-731-0106