[R-sig-Geo] Sampling from a SAR error model

Roger Bivand Roger.Bivand at nhh.no
Thu Sep 21 21:53:02 CEST 2006


On Thu, 21 Sep 2006, Sam Field wrote:

> Roger,
> 
> Thanks for the code.  Just now had time to get back to the problem.
> Sorry it took so long.
> 
> you wrote,
> 
> > to just take the null models in both cases, where lambda == rho
> > by
> > definition, but the intercept term differs (I'm not sure how to
> > insert
> > autocorrelation into the intercept in the lag model, which is
> > probably a
> > source of error in the simulation.
> 
> I am a little confused by this statement.  I thought that the only
> difference between the lag and error model is that in the error
> model, spatial "spill overs" are restricted to the error term.  I
> was thinking that lambda would always equal rho.  Not just in the
> null model.  I was also thinking that the spatial lag model would
> not be properly identified without at least on covariate, since
> absent any variation in the systematic component of the model (Xb),
> what would the spatial lag model mean? If you premultipy a vector of
> constants by (I-pW)^-1, you just get back another vector of
> constants.  

For row-standardised, yes, but not necessarily in other styles of 
weighting, so it is a bit picky. The code in the functions tries to 
accommodate that, following reports from users not using row 
standardisation.

> A couple of other observations. When the sample size in your example is
> increased, the lambda and rho estimates are quite a bit closer to their
> simluated values.
> 
> I added a single covariate in your example, and the recovery of
> labmda and rho preformed about as well as the null models.
> 
> Although your example didn't recover the simulated values exactly. 
> They were close enough to make me think that there is something
> funny about my example. So this is where I now turn.
> 

OK, that seems sensible. Please keep the list informed of where this takes 
you.

Best wishes,

Roger

> thanks for your input!
> 
> Sam
> 
> 
> 
> Quoting Roger Bivand <Roger.Bivand at nhh.no>:
> 
> > Sam,
> >
> > Indeed perplexing. I've tried to slim things down to:
> >
> > library(spdep)
> > nsamp <- 100
> > rho <- 0.4
> > nb7rt <- cell2nb(7, 7, torus=TRUE)
> > set.seed(20060918)
> > e <- matrix(rnorm(nsamp*length(nb7rt)), nrow=length(nb7rt))
> > listw <- nb2listw(nb7rt)
> > invmat <- invIrW(listw, rho=rho)
> > y <- invmat %*% e
> > lag_res0 <- lapply(1:nsamp, function(i)
> >   lagsarlm(y[,i] ~ 1, listw=listw))
> > err_res0 <- lapply(1:nsamp, function(i)
> >   errorsarlm(y[,i] ~ 1, listw=listw))
> > summary(do.call("rbind", lapply(lag_res0, coefficients)))
> > summary(do.call("rbind", lapply(err_res0, coefficients)))
> >
> > to just take the null models in both cases, where lambda == rho
> > by
> > definition, but the intercept term differs (I'm not sure how to
> > insert
> > autocorrelation into the intercept in the lag model, which is
> > probably a
> > source of error in the simulation.
> >
> > I also had a look at Haining 1990 and the use of the Cholesky
> > lower
> > triangle of the SAR (simultaneous autoregressive) covariance
> > matrix, and
> > tried:
> >
> > W <- listw2mat(listw)
> > Id <- diag(length(nb7rt))
> > sarcov <- solve(t(Id - rho*W) %*% (Id - rho*W))
> > sarcovL <- chol((sarcov + t(sarcov))/2)
> > y <- t(sarcovL) %*% e # Kaluzny et al. 1996, p. 160
> > lag_res1 <- lapply(1:nsamp, function(i)
> >   lagsarlm(y[,i] ~ 1, listw=listw))
> > err_res1 <- lapply(1:nsamp, function(i)
> >   errorsarlm(y[,i] ~ 1, listw=listw))
> > summary(do.call("rbind", lapply(lag_res1, coefficients)))
> > summary(do.call("rbind", lapply(err_res1, coefficients)))
> >
> > which is similar to, but not the same as just premultiplying by
> > (I - rho W)^{-1}, and where I'm very unsure how to simulate for
> > the lag
> > model correctly. The error model simulation is, however, perhaps
> > best done
> > this way, isn't it?
> >
> > Running this shows that none of them retrieve the input
> > coefficient value
> > exactly. The weights here are about as well-behaved as possible,
> > all
> > observations have four (grid on a torus), and things perhaps get
> > worse
> > with less regular weights.
> >
> > Simulating for the error model is discussed in the literature,
> > but the lag
> > model is something of an oddity, certainly worth sorting out.
> >
> > Best wishes,
> >
> > Roger
> >
> >
> > On Sat, 16 Sep 2006, Sam Field wrote:
> >
> > > Roger and Terry,
> > >
> > > Thanks to both of you for your reply. Yes, sigma^2 enters into
> > the
> > > generation of e. The systematic component of the model,
> > > X_mat%*%parms, has a variance of around 10^2.  e is typically
> > > around 4^2, or...
> > >
> > > e <- rnorm(n,mean=0,sd=4)
> > >
> > >
> > > I went ahead and copied the code below (it isn't too long) - in
> > case
> > > others are interested in this exersise.  The car package is
> > only
> > > necessary for the n.bins function in the last lines of the
> > code. So
> > > it is optional.  I also tried the invIrM function suggested by
> > > Terry.  It seems to produce the same results that I get.  It
> > also
> > > appears in the code below. Any help/suggestions would be
> > greatly
> > > appreciated.  I am stuck.
> > >
> > >
> > >
> > > #creating data
> > >
> > > library(spdep)
> > > library(car)
> > >
> > >
> > > #sample size
> > > n <- 100
> > >
> > >
> > >
> > >
> > >
> > > #coordinates
> > >
> > > x_coord <- runif(n,0,10)
> > > y_coord <- runif(n,0,10)
> > >
> > >
> > > ## w matrix and row normalized
> > >
> > > w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> > >
> > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > > {w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 +
> > > (y_coord[i]-y_coord[j])^2))}}
> > >
> > >
> > > w_dummy <-  matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> > >
> > >
> > > #defines neighborhood distance
> > >
> > > neighbor_dist <- 3
> > >
> > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > > {if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}}
> > >
> > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > > {if( i == j ) w_dummy[i,j] <- 0}}
> > >
> > >
> > > row_sum <- rep(1,length(x_coord))
> > > for(i in 1:length(x_coord))
> > > 	{row_sum[i] <- sum(w_dummy[i,]) }
> > >
> > > w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
> > >
> > >
> > > # divide by row sums
> > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> > > 	{w[i,j] <- w_dummy[i,j]/row_sum[i]}}
> > >
> > >
> > >
> > > neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist)
> > > nb_list <- nb2listw(neighbors)
> > >
> > > #create some independent variables
> > > X1 <- rnorm(n,4,2)
> > > X2 <- rnorm(n,2,2)
> > > X3 <- rnorm(n,0,1)
> > > X4 <- rnorm(n,13,3)
> > > X5 <- rnorm(n,21,.5)
> > > X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5)
> > >
> > > # population parameters
> > > parms <- c(12.4,2,-4,5,1,-.5)
> > > rho <- .3
> > >
> > >
> > >
> > > #begin iterations
> > > n_iter <- 100  #draw 1500 samples of size n
> > >
> > > estimates <- matrix(nrow=n_iter,ncol=7)
> > > estimates2 <- matrix(nrow=n_iter,ncol=7)
> > > estimates3 <- matrix(nrow=n_iter,ncol=7)
> > >
> > >
> > > for(iter in 1:n_iter)
> > > 	{
> > >
> > >
> > >
> > > e <- rnorm(n,mean=0,sd=4)
> > >
> > >
> > > # drawing samples
> > > y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms +
> > > solve(diag(100)-rho*w)%*%e
> > > y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e
> > > y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e
> > >
> > >
> > > lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5,
> > > listw=nb_list,type='lag')
> > > lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5,
> > > listw=nb_list)
> > > lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5,
> > > listw=nb_list)
> > >
> > >
> > > # retrieve parameter estimates
> > >
> > > estimates[iter,] <- coefficients(lag_model)
> > > estimates2[iter,] <- coefficients(lag_model2)
> > > estimates3[iter,] <- coefficients(lag_model3)
> > > }
> > >
> > >
> > >
> > > mean(estimates[,7])
> > >
> > hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE)
> > > lines(density(estimates[,7]),lwd=2)
> > >
> > > mean(estimates2[,7])
> > >
> >
> hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE)
> > > lines(density(estimates2[,7]),lwd=2)
> > >
> > > mean(estimates3[,7])
> > >
> >
> hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE)
> > > lines(density(estimates3[,7]),lwd=2)
> > >
> > >
> > > #data <- data.frame(cbind(y_lag, y_error,
> > > X1,X2,X3,X4,X5,x_coord,y_coord))
> > > #names(data) <- c("y_lag","y_error",names(data)[3:9])
> > > #write.table(data,file="C:/test.txt")
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Quoting Roger Bivand <Roger.Bivand at nhh.no>:
> > >
> > > > On Fri, 15 Sep 2006, Sam Field wrote:
> > > >
> > > > > List,
> > > > >
> > > > > I am having trouble writing a script that samples from an
> > SAR
> > > > error
> > > > > process. I've done it successfully for a spatial lag model,
> > but
> > > > not
> > > > > for the spatial error model.  For the spatial lag model, I
> > have
> > > > the
> > > > > following:
> > > > >
> > > > > y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms +
> > > > > solve(diag(100)-p*w)%*%e
> > > > >
> > > > > where parms is a parameter vector
> > > > > X_mat is n by p matrix of independent variables (+
> > constant)
> > > > > e is a vector of indepndent normal deviates (mean = 0)
> > > > > p is the autoregressive paramter
> > > > > and w is a square, n by n contiguity matrix (row
> > normalized).
> > > > >
> > > > > This works beautifully.  lagsarlm recovers parms and p
> > without
> > > > a
> > > > > problem. Over repeated sampling, the estimated values are
> > > > centered
> > > > > on the value for p in the simulation.
> > > > >
> > > > >
> > > > > Is there something wrong with the following for the spatial
> > > > error
> > > > > model?
> > > > >
> > > > > y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e
> > > > >
> > > >
> > > > Interesting question. Where is the sigma^2 going in your case
> > -
> > > > is it in
> > > > the generation of e? Could you try to use the columbus
> > dataset
> > > > and
> > > > set.seed to generate a reproducible example - it may be that
> > what
> > > > you have
> > > > written does not communicate exactly what you have done?
> > > >
> > > > Roger
> > > >
> > > > >
> > > > > The distribution of values for p obtain from errorsarlm
> > over
> > > > > repeated sampling are not centered around the value for the
> > > > > simulation, but are typically much lower and all over the
> > > > place.  I
> > > > > have only looked at values for p ranging from .3 to .7.
> > > > >
> > > > > any help would be greatly appreciated!
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Roger Bivand
> > > > Economic Geography Section, Department of Economics,
> > Norwegian
> > > > School of
> > > > Economics and Business Administration, Helleveien 30, N-5045
> > > > Bergen,
> > > > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > > > e-mail: Roger.Bivand at nhh.no
> > > >
> > > >
> > >
> > >
> > >
> >
> > --
> > Roger Bivand
> > Economic Geography Section, Department of Economics, Norwegian
> > School of
> > Economics and Business Administration, Helleveien 30, N-5045
> > Bergen,
> > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> > e-mail: Roger.Bivand at nhh.no
> >
> >
> 
> 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list