[R-sig-Geo] Spatial Error Model - ANOVA table

Wed Aug 20 20:44:00 CEST 2008

List,

Is it possible to decompose the variance of an outcome into trend, signal and noise components using a SAR model - analogous to what one would get with an OLS model? This doesn't seem to be straight forward. With OLS, we decompose Y into two non-overlapping components

Y = fit + error

R-square is based on the idea that the total sum of squares of y is partition neatly into two quantities - sum of square model and sum of square error.  

With the spatial error model Y is broken up into three components

Y = trend + signal + noise  

where trend = Xbeta
      signal = labmdaWy - lambdaWXbeta
      noise = Y - (trend + signal)

You would think that the total sum of square would also fit neatly into these three buckets, in which case you could easily calcluate the proportion of the total sum of squares contributed by each.  However, I have not found this to be the case.   The sum of squares from each of the three components do not add up to the total sum of squares of Y.  Possibly becase the signal component involves an infinite expansion when you substitue Y into lambdaWYy

Y = Xbeta + lambda(Xbeta + labmda(Xbeta + lambda(..) - labmdaWXbeta + u) - lambdaWXbeta + u) - labmdaWXbeta + u.

I know that the predict.errorsarlm() function uses the obeserved Y vector in its calculation of signal, but why doesn't the sum of squares (i.e. sum((signal - mean(Y))^2), sum((u^2)), and sum((XB - mean(Y))^2) add up to the total sum of squares? (i.e. sum((Y - mean(Y))?

Any help would be greatly appreciated.  Here is some example code in case that is helpful.  Please note that the example below does not "isolate" the signal component, but combines it with the trend component.

#creating data
library(spdep)

#sample size
n <- 200

#coordinates

x_coord <- runif(n,0,10)
y_coord <- runif(n,0,10)

## w matrix and row normalized

w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)

for(i in 1:length(x_coord)){for(j in 1:length(x_coord)) 
{w_raw[i,j]<- 1/(sqrt((x_coord[i]-x_coord[j])^2 + (y_coord[i]-y_coord[j])^2))^2}}

diag(w_raw) <- rep(0,n)

row_sum <- rep(1,length(x_coord))
for(i in 1:length(x_coord))  
        {row_sum[i] <- sum(w_raw[i,]) }

w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)

for(i in 1:length(x_coord)){for(j in 1:length(x_coord)) 
        {w[i,j] <- w_raw[i,j]/row_sum[i]}}

x <- rbinom(n,1,.5)

parms <- c(12.4,2)
w.listw <- mat2listw(w)

e <- rnorm(n,0,1)
p <- .6

y <- parms[1] + parms[2]*x + (solve(diag(n)- p*w))%*%e 

sim.data <- as.data.frame(cbind(y,x))
names(sim.data) <- c("y","x")
error.mod <- errorsarlm(y~x,data = sim.data,w.listw)

summary(error.mod)

X <- as.matrix(cbind(rep(1,length(sim.data$y)),sim.data$x))

trend <- X%*%coef(error.mod)[-1]
signal <-  error.mod$lambda*w%*%sim.data$y - error.mod$lambda*w%*%X%*%coef(error.mod)[-1]
fit <- trend + signal
error <-  sim.data$y - fit

SST <- sum((sim.data$y - mean(sim.data$y))^2)
SSE <- sum(error^2)
SSM <- sum(((trend + signal)- mean(sim.data$y))^2)

SSE + SSM
SST

Samuel H. Field, Ph.D.
Philadelphia VA Medical Center