[R-sig-Geo] Spatial Error Model - ANOVA table
Samuel Field
fieldsh at mail.med.upenn.edu
Wed Aug 20 20:44:00 CEST 2008
List,
Is it possible to decompose the variance of an outcome into trend, signal and noise components using a SAR model - analogous to what one would get with an OLS model? This doesn't seem to be straight forward. With OLS, we decompose Y into two non-overlapping components
Y = fit + error
R-square is based on the idea that the total sum of squares of y is partition neatly into two quantities - sum of square model and sum of square error.
With the spatial error model Y is broken up into three components
Y = trend + signal + noise
where trend = Xbeta
signal = labmdaWy - lambdaWXbeta
noise = Y - (trend + signal)
You would think that the total sum of square would also fit neatly into these three buckets, in which case you could easily calcluate the proportion of the total sum of squares contributed by each. However, I have not found this to be the case. The sum of squares from each of the three components do not add up to the total sum of squares of Y. Possibly becase the signal component involves an infinite expansion when you substitue Y into lambdaWYy
Y = Xbeta + lambda(Xbeta + labmda(Xbeta + lambda(..) - labmdaWXbeta + u) - lambdaWXbeta + u) - labmdaWXbeta + u.
I know that the predict.errorsarlm() function uses the obeserved Y vector in its calculation of signal, but why doesn't the sum of squares (i.e. sum((signal - mean(Y))^2), sum((u^2)), and sum((XB - mean(Y))^2) add up to the total sum of squares? (i.e. sum((Y - mean(Y))?
Any help would be greatly appreciated. Here is some example code in case that is helpful. Please note that the example below does not "isolate" the signal component, but combines it with the trend component.
#creating data
library(spdep)
#sample size
n <- 200
#coordinates
x_coord <- runif(n,0,10)
y_coord <- runif(n,0,10)
## w matrix and row normalized
w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{w_raw[i,j]<- 1/(sqrt((x_coord[i]-x_coord[j])^2 + (y_coord[i]-y_coord[j])^2))^2}}
diag(w_raw) <- rep(0,n)
row_sum <- rep(1,length(x_coord))
for(i in 1:length(x_coord))
{row_sum[i] <- sum(w_raw[i,]) }
w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{w[i,j] <- w_raw[i,j]/row_sum[i]}}
x <- rbinom(n,1,.5)
parms <- c(12.4,2)
w.listw <- mat2listw(w)
e <- rnorm(n,0,1)
p <- .6
y <- parms[1] + parms[2]*x + (solve(diag(n)- p*w))%*%e
sim.data <- as.data.frame(cbind(y,x))
names(sim.data) <- c("y","x")
error.mod <- errorsarlm(y~x,data = sim.data,w.listw)
summary(error.mod)
X <- as.matrix(cbind(rep(1,length(sim.data$y)),sim.data$x))
trend <- X%*%coef(error.mod)[-1]
signal <- error.mod$lambda*w%*%sim.data$y - error.mod$lambda*w%*%X%*%coef(error.mod)[-1]
fit <- trend + signal
error <- sim.data$y - fit
SST <- sum((sim.data$y - mean(sim.data$y))^2)
SSE <- sum(error^2)
SSM <- sum(((trend + signal)- mean(sim.data$y))^2)
SSE + SSM
SST
Samuel H. Field, Ph.D.
Philadelphia VA Medical Center
More information about the R-sig-Geo
mailing list