[R-sig-Geo] Sampling from a SAR error model

Thu Sep 21 21:39:05 CEST 2006

Roger,

Thanks for the code.  Just now had time to get back to the problem.
Sorry it took so long.

you wrote,

> to just take the null models in both cases, where lambda == rho
> by
> definition, but the intercept term differs (I'm not sure how to
> insert
> autocorrelation into the intercept in the lag model, which is
> probably a
> source of error in the simulation.

I am a little confused by this statement.  I thought that the only
difference between the lag and error model is that in the error
model, spatial "spill overs" are restricted to the error term.  I
was thinking that lambda would always equal rho.  Not just in the
null model.  I was also thinking that the spatial lag model would
not be properly identified without at least on covariate, since
absent any variation in the systematic component of the model (Xb),
what would the spatial lag model mean? If you premultipy a vector of
constants by (I-pW)^-1, you just get back another vector of
constants.  A couple of other observations.

When the sample size in your example is increased, the lambda and
rho estimates are quite a bit closer to their simluated values.

I added a single covariate in your example, and the recovery of
labmda and rho preformed about as well as the null models.

Although your example didn't recover the simulated values exactly. 
They were close enough to make me think that there is something
funny about my example. So this is where I now turn.

thanks for your input!

Sam

Quoting Roger Bivand <Roger.Bivand at nhh.no>:

> Sam,
>
> Indeed perplexing. I've tried to slim things down to:
>
> library(spdep)
> nsamp <- 100
> rho <- 0.4
> nb7rt <- cell2nb(7, 7, torus=TRUE)
> set.seed(20060918)
> e <- matrix(rnorm(nsamp*length(nb7rt)), nrow=length(nb7rt))
> listw <- nb2listw(nb7rt)
> invmat <- invIrW(listw, rho=rho)
> y <- invmat %*% e
> lag_res0 <- lapply(1:nsamp, function(i)
>   lagsarlm(y[,i] ~ 1, listw=listw))
> err_res0 <- lapply(1:nsamp, function(i)
>   errorsarlm(y[,i] ~ 1, listw=listw))
> summary(do.call("rbind", lapply(lag_res0, coefficients)))
> summary(do.call("rbind", lapply(err_res0, coefficients)))
>
> to just take the null models in both cases, where lambda == rho
> by
> definition, but the intercept term differs (I'm not sure how to
> insert
> autocorrelation into the intercept in the lag model, which is
> probably a
> source of error in the simulation.
>
> I also had a look at Haining 1990 and the use of the Cholesky
> lower
> triangle of the SAR (simultaneous autoregressive) covariance
> matrix, and
> tried:
>
> W <- listw2mat(listw)
> Id <- diag(length(nb7rt))
> sarcov <- solve(t(Id - rho*W) %*% (Id - rho*W))
> sarcovL <- chol((sarcov + t(sarcov))/2)
> y <- t(sarcovL) %*% e # Kaluzny et al. 1996, p. 160
> lag_res1 <- lapply(1:nsamp, function(i)
>   lagsarlm(y[,i] ~ 1, listw=listw))
> err_res1 <- lapply(1:nsamp, function(i)
>   errorsarlm(y[,i] ~ 1, listw=listw))
> summary(do.call("rbind", lapply(lag_res1, coefficients)))
> summary(do.call("rbind", lapply(err_res1, coefficients)))
>
> which is similar to, but not the same as just premultiplying by
> (I - rho W)^{-1}, and where I'm very unsure how to simulate for
> the lag
> model correctly. The error model simulation is, however, perhaps
> best done
> this way, isn't it?
>
> Running this shows that none of them retrieve the input
> coefficient value
> exactly. The weights here are about as well-behaved as possible,
> all
> observations have four (grid on a torus), and things perhaps get
> worse
> with less regular weights.
>
> Simulating for the error model is discussed in the literature,
> but the lag
> model is something of an oddity, certainly worth sorting out.
>
> Best wishes,
>
> Roger
>
>
> On Sat, 16 Sep 2006, Sam Field wrote:
>
> > Roger and Terry,
> >
> > Thanks to both of you for your reply. Yes, sigma^2 enters into
> the
> > generation of e. The systematic component of the model,
> > X_mat%*%parms, has a variance of around 10^2.  e is typically
> > around 4^2, or...
> >
> > e <- rnorm(n,mean=0,sd=4)
> >
> >
> > I went ahead and copied the code below (it isn't too long) - in
> case
> > others are interested in this exersise.  The car package is
> only
> > necessary for the n.bins function in the last lines of the
> code. So
> > it is optional.  I also tried the invIrM function suggested by
> > Terry.  It seems to produce the same results that I get.  It
> also
> > appears in the code below. Any help/suggestions would be
> greatly
> > appreciated.  I am stuck.
> >
> >
> >
> > #creating data
> >
> > library(spdep)
> > library(car)
> >
> >
> > #sample size
> > n <- 100
> >
> >
> >
> >
> >
> > #coordinates
> >
> > x_coord <- runif(n,0,10)
> > y_coord <- runif(n,0,10)
> >
> >
> > ## w matrix and row normalized
> >
> > w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> >
> > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > {w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 +
> > (y_coord[i]-y_coord[j])^2))}}
> >
> >
> > w_dummy <-  matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> >
> >
> > #defines neighborhood distance
> >
> > neighbor_dist <- 3
> >
> > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > {if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}}
> >
> > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > {if( i == j ) w_dummy[i,j] <- 0}}
> >
> >
> > row_sum <- rep(1,length(x_coord))
> > for(i in 1:length(x_coord))
> > 	{row_sum[i] <- sum(w_dummy[i,]) }
> >
> > w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> >
> >
> > # divide by row sums
> > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > 	{w[i,j] <- w_dummy[i,j]/row_sum[i]}}
> >
> >
> >
> > neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist)
> > nb_list <- nb2listw(neighbors)
> >
> > #create some independent variables
> > X1 <- rnorm(n,4,2)
> > X2 <- rnorm(n,2,2)
> > X3 <- rnorm(n,0,1)
> > X4 <- rnorm(n,13,3)
> > X5 <- rnorm(n,21,.5)
> > X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5)
> >
> > # population parameters
> > parms <- c(12.4,2,-4,5,1,-.5)
> > rho <- .3
> >
> >
> >
> > #begin iterations
> > n_iter <- 100  #draw 1500 samples of size n
> >
> > estimates <- matrix(nrow=n_iter,ncol=7)
> > estimates2 <- matrix(nrow=n_iter,ncol=7)
> > estimates3 <- matrix(nrow=n_iter,ncol=7)
> >
> >
> > for(iter in 1:n_iter)
> > 	{
> >
> >
> >
> > e <- rnorm(n,mean=0,sd=4)
> >
> >
> > # drawing samples
> > y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms +
> > solve(diag(100)-rho*w)%*%e
> > y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e
> > y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e
> >
> >
> > lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5,
> > listw=nb_list,type='lag')
> > lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5,
> > listw=nb_list)
> > lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5,
> > listw=nb_list)
> >
> >
> > # retrieve parameter estimates
> >
> > estimates[iter,] <- coefficients(lag_model)
> > estimates2[iter,] <- coefficients(lag_model2)
> > estimates3[iter,] <- coefficients(lag_model3)
> > }
> >
> >
> >
> > mean(estimates[,7])
> >
> hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE)
> > lines(density(estimates[,7]),lwd=2)
> >
> > mean(estimates2[,7])
> >
>
hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE)
> > lines(density(estimates2[,7]),lwd=2)
> >
> > mean(estimates3[,7])
> >
>
hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE)
> > lines(density(estimates3[,7]),lwd=2)
> >
> >
> > #data <- data.frame(cbind(y_lag, y_error,
> > X1,X2,X3,X4,X5,x_coord,y_coord))
> > #names(data) <- c("y_lag","y_error",names(data)[3:9])
> > #write.table(data,file="C:/test.txt")
> >
> >
> >
> >
> >
> >
> >
> >
> > Quoting Roger Bivand <Roger.Bivand at nhh.no>:
> >
> > > On Fri, 15 Sep 2006, Sam Field wrote:
> > >
> > > > List,
> > > >
> > > > I am having trouble writing a script that samples from an
> SAR
> > > error
> > > > process. I've done it successfully for a spatial lag model,
> but
> > > not
> > > > for the spatial error model.  For the spatial lag model, I
> have
> > > the
> > > > following:
> > > >
> > > > y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms +
> > > > solve(diag(100)-p*w)%*%e
> > > >
> > > > where parms is a parameter vector
> > > > X_mat is n by p matrix of independent variables (+
> constant)
> > > > e is a vector of indepndent normal deviates (mean = 0)
> > > > p is the autoregressive paramter
> > > > and w is a square, n by n contiguity matrix (row
> normalized).
> > > >
> > > > This works beautifully.  lagsarlm recovers parms and p
> without
> > > a
> > > > problem. Over repeated sampling, the estimated values are
> > > centered
> > > > on the value for p in the simulation.
> > > >
> > > >
> > > > Is there something wrong with the following for the spatial
> > > error
> > > > model?
> > > >
> > > > y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e
> > > >
> > >
> > > Interesting question. Where is the sigma^2 going in your case
> -
> > > is it in
> > > the generation of e? Could you try to use the columbus
> dataset
> > > and
> > > set.seed to generate a reproducible example - it may be that
> what
> > > you have
> > > written does not communicate exactly what you have done?
> > >
> > > Roger
> > >
> > > >
> > > > The distribution of values for p obtain from errorsarlm
> over
> > > > repeated sampling are not centered around the value for the
> > > > simulation, but are typically much lower and all over the
> > > place.  I
> > > > have only looked at values for p ranging from .3 to .7.
> > > >
> > > > any help would be greatly appreciated!
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Roger Bivand
> > > Economic Geography Section, Department of Economics,
> Norwegian
> > > School of
> > > Economics and Business Administration, Helleveien 30, N-5045
> > > Bergen,
> > > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > > e-mail: Roger.Bivand at nhh.no
> > >
> > >
> >
> >
> >
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian
> School of
> Economics and Business Administration, Helleveien 30, N-5045
> Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>

-- 
Samuel H. Field, Ph.D.
Associate Research Scholar
Institute for Social and Economic Research and Policy (ISERP)
Columbia University
420 W. 118th Street
Mail code 3355
New York, NY  10027
215-731-0106