[R-sig-Geo] Sampling from a SAR error model

Roger Bivand Roger.Bivand at nhh.no
Tue Sep 19 10:20:11 CEST 2006


Sam,

Indeed perplexing. I've tried to slim things down to:

library(spdep)
nsamp <- 100
rho <- 0.4
nb7rt <- cell2nb(7, 7, torus=TRUE)
set.seed(20060918)
e <- matrix(rnorm(nsamp*length(nb7rt)), nrow=length(nb7rt))
listw <- nb2listw(nb7rt)
invmat <- invIrW(listw, rho=rho)
y <- invmat %*% e
lag_res0 <- lapply(1:nsamp, function(i)
  lagsarlm(y[,i] ~ 1, listw=listw))
err_res0 <- lapply(1:nsamp, function(i)
  errorsarlm(y[,i] ~ 1, listw=listw))
summary(do.call("rbind", lapply(lag_res0, coefficients)))
summary(do.call("rbind", lapply(err_res0, coefficients)))

to just take the null models in both cases, where lambda == rho by 
definition, but the intercept term differs (I'm not sure how to insert 
autocorrelation into the intercept in the lag model, which is probably a 
source of error in the simulation.

I also had a look at Haining 1990 and the use of the Cholesky lower 
triangle of the SAR (simultaneous autoregressive) covariance matrix, and 
tried:

W <- listw2mat(listw)
Id <- diag(length(nb7rt))
sarcov <- solve(t(Id - rho*W) %*% (Id - rho*W))
sarcovL <- chol((sarcov + t(sarcov))/2)
y <- t(sarcovL) %*% e # Kaluzny et al. 1996, p. 160
lag_res1 <- lapply(1:nsamp, function(i)
  lagsarlm(y[,i] ~ 1, listw=listw))
err_res1 <- lapply(1:nsamp, function(i)
  errorsarlm(y[,i] ~ 1, listw=listw))
summary(do.call("rbind", lapply(lag_res1, coefficients)))
summary(do.call("rbind", lapply(err_res1, coefficients)))

which is similar to, but not the same as just premultiplying by 
(I - \rho W)^{-1}, and where I'm very unsure how to simulate for the lag 
model correctly. The error model simulation is, however, perhaps best done 
this way, isn't it? 

Running this shows that none of them retrieve the input coefficient value 
exactly. The weights here are about as well-behaved as possible, all 
observations have four (grid on a torus), and things perhaps get worse 
with less regular weights.

Simulating for the error model is discussed in the literature, but the lag 
model is something of an oddity, certainly worth sorting out.

Best wishes,

Roger


On Sat, 16 Sep 2006, Sam Field wrote:

> Roger and Terry,
> 
> Thanks to both of you for your reply. Yes, sigma^2 enters into the
> generation of e. The systematic component of the model,
> X_mat%*%parms, has a variance of around 10^2.  e is typically
> around 4^2, or...
> 
> e <- rnorm(n,mean=0,sd=4)
> 
> 
> I went ahead and copied the code below (it isn't too long) - in case
> others are interested in this exersise.  The car package is only
> necessary for the n.bins function in the last lines of the code. So
> it is optional.  I also tried the invIrM function suggested by
> Terry.  It seems to produce the same results that I get.  It also
> appears in the code below. Any help/suggestions would be greatly
> appreciated.  I am stuck.
> 
> 
> 
> #creating data
> 
> library(spdep)
> library(car)
> 
> 
> #sample size
> n <- 100
> 
> 
> 
> 
> 
> #coordinates
> 
> x_coord <- runif(n,0,10)
> y_coord <- runif(n,0,10)
> 
> 
> ## w matrix and row normalized
> 
> w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> 
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 +
> (y_coord[i]-y_coord[j])^2))}}
> 
> 
> w_dummy <-  matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> 
> 
> #defines neighborhood distance
> 
> neighbor_dist <- 3
> 
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}}
> 
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {if( i == j ) w_dummy[i,j] <- 0}}
> 
> 
> row_sum <- rep(1,length(x_coord))
> for(i in 1:length(x_coord))
> 	{row_sum[i] <- sum(w_dummy[i,]) }
> 
> w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> 
> 
> # divide by row sums
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> 	{w[i,j] <- w_dummy[i,j]/row_sum[i]}}
> 
> 
> 
> neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist)
> nb_list <- nb2listw(neighbors)
> 
> #create some independent variables
> X1 <- rnorm(n,4,2)
> X2 <- rnorm(n,2,2)
> X3 <- rnorm(n,0,1)
> X4 <- rnorm(n,13,3)
> X5 <- rnorm(n,21,.5)
> X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5)
> 
> # population parameters
> parms <- c(12.4,2,-4,5,1,-.5)
> rho <- .3
> 
> 
> 
> #begin iterations
> n_iter <- 100  #draw 1500 samples of size n
> 
> estimates <- matrix(nrow=n_iter,ncol=7)
> estimates2 <- matrix(nrow=n_iter,ncol=7)
> estimates3 <- matrix(nrow=n_iter,ncol=7)
> 
> 
> for(iter in 1:n_iter)
> 	{
> 
> 
> 
> e <- rnorm(n,mean=0,sd=4)
> 
> 
> # drawing samples
> y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms +
> solve(diag(100)-rho*w)%*%e
> y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e
> y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e
> 
> 
> lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5,
> listw=nb_list,type='lag')
> lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5,
> listw=nb_list)
> lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5,
> listw=nb_list)
> 
> 
> # retrieve parameter estimates
> 
> estimates[iter,] <- coefficients(lag_model)
> estimates2[iter,] <- coefficients(lag_model2)
> estimates3[iter,] <- coefficients(lag_model3)
> }
> 
> 
> 
> mean(estimates[,7])
> hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE)
> lines(density(estimates[,7]),lwd=2)
> 
> mean(estimates2[,7])
> hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE)
> lines(density(estimates2[,7]),lwd=2)
> 
> mean(estimates3[,7])
> hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE)
> lines(density(estimates3[,7]),lwd=2)
> 
> 
> #data <- data.frame(cbind(y_lag, y_error,
> X1,X2,X3,X4,X5,x_coord,y_coord))
> #names(data) <- c("y_lag","y_error",names(data)[3:9])
> #write.table(data,file="C:/test.txt")
> 
> 
> 
> 
> 
> 
> 
> 
> Quoting Roger Bivand <Roger.Bivand at nhh.no>:
> 
> > On Fri, 15 Sep 2006, Sam Field wrote:
> >
> > > List,
> > >
> > > I am having trouble writing a script that samples from an SAR
> > error
> > > process. I've done it successfully for a spatial lag model, but
> > not
> > > for the spatial error model.  For the spatial lag model, I have
> > the
> > > following:
> > >
> > > y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms +
> > > solve(diag(100)-p*w)%*%e
> > >
> > > where parms is a parameter vector
> > > X_mat is n by p matrix of independent variables (+ constant)
> > > e is a vector of indepndent normal deviates (mean = 0)
> > > p is the autoregressive paramter
> > > and w is a square, n by n contiguity matrix (row normalized).
> > >
> > > This works beautifully.  lagsarlm recovers parms and p without
> > a
> > > problem. Over repeated sampling, the estimated values are
> > centered
> > > on the value for p in the simulation.
> > >
> > >
> > > Is there something wrong with the following for the spatial
> > error
> > > model?
> > >
> > > y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e
> > >
> >
> > Interesting question. Where is the sigma^2 going in your case -
> > is it in
> > the generation of e? Could you try to use the columbus dataset
> > and
> > set.seed to generate a reproducible example - it may be that what
> > you have
> > written does not communicate exactly what you have done?
> >
> > Roger
> >
> > >
> > > The distribution of values for p obtain from errorsarlm over
> > > repeated sampling are not centered around the value for the
> > > simulation, but are typically much lower and all over the
> > place.  I
> > > have only looked at values for p ranging from .3 to .7.
> > >
> > > any help would be greatly appreciated!
> > >
> > >
> > >
> > >
> >
> > --
> > Roger Bivand
> > Economic Geography Section, Department of Economics, Norwegian
> > School of
> > Economics and Business Administration, Helleveien 30, N-5045
> > Bergen,
> > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > e-mail: Roger.Bivand at nhh.no
> >
> >
> 
> 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list