[R-sig-Geo] Spatial Error Model - ANOVA table

Thu Aug 21 14:12:30 CEST 2008

On Wed, 20 Aug 2008, Samuel Field wrote:

> List,
>
> Is it possible to decompose the variance of an outcome into trend, 
> signal and noise components using a SAR model - analogous to what one 
> would get with an OLS model? This doesn't seem to be straight forward. 
> With OLS, we decompose Y into two non-overlapping components
>
> Y = fit + error
>
> R-square is based on the idea that the total sum of squares of y is 
> partition neatly into two quantities - sum of square model and sum of 
> square error.
>
>
> With the spatial error model Y is broken up into three components
>
> Y = trend + signal + noise
>
>
> where trend = Xbeta
>      signal = labmdaWy - lambdaWXbeta
>      noise = Y - (trend + signal)
>
> You would think that the total sum of square would also fit neatly into 
> these three buckets, in which case you could easily calcluate the 
> proportion of the total sum of squares contributed by each.

In fact, I think it is easier to think of this in generalised least 
squares terms. For fitting the coefficients, we can abstract from the 
covariance matrix of the observations, by first finding lambda by line 
search, then using lm() to fit I(y - lambda * wy) ~ I(x - lambda * WX). 
You can see this by looking at the lm.target component of the sarlm object 
returned by errorsarlm.

By the way, your error, trend, and signal components agree with those from 
predict.sarlm():

#set.seed(1)
pred <- predict(error.mod)
all.equal(c(fit), unclass(pred), check.attributes=FALSE)
all.equal(c(signal), attr(pred, "signal"), check.attributes=FALSE)
all.equal(c(trend), attr(pred, "trend"), check.attributes=FALSE)
all.equal(err1, c(error), check.attributes=FALSE)
all.equal(residuals(error.mod), c(error), check.attributes=FALSE)

anova(error.mod$lm.target)
crossprod(fitted(error.mod$lm.target)) + 
crossprod(residuals(error.mod$lm.target))
crossprod(error.mod$lm.target$model$"I(y - lambda * wy)")

However, it is more difficult to get back to the untransformed space of 
the original model in terms of the response; the fitted values, 
residuals, etc., are possible, but the variance/covariance components are 
harder because of the autocorrelation. There is a discussion on pp. 
563-564 in Cressie (1993), and sections 6.2.2 and 6.2.3 (pp. 335-348) in 
Schabenburger and Gotway (2005) are relevant here. Current practice is 
typically to use the log-likelihood value and derivatives to assess how 
the model is doing. I found looking at lm.gls() in MASS instructive for 
following some of these issues up.

Not really an answer, but maybe others can help re-frame the question to 
shed more light on the issues involved.

Roger

> However, I have not found this to be the case.  The sum of squares from 
> each of the three components do not add up to the total sum of squares 
> of Y.  Possibly becase the signal component involves an infinite 
> expansion when you substitue Y into lambdaWYy
>
> Y = Xbeta + lambda(Xbeta + labmda(Xbeta + lambda(..) - labmdaWXbeta + u) 
> - lambdaWXbeta + u) - labmdaWXbeta + u.
>
>
> I know that the predict.errorsarlm() function uses the obeserved Y 
> vector in its calculation of signal, but why doesn't the sum of squares 
> (i.e. sum((signal - mean(Y))^2), sum((u^2)), and sum((XB - mean(Y))^2) 
> add up to the total sum of squares? (i.e. sum((Y - mean(Y))?
>
>
> Any help would be greatly appreciated.  Here is some example code in 
> case that is helpful.  Please note that the example below does not 
> "isolate" the signal component, but combines it with the trend 
> component.
>
>
> #creating data
> library(spdep)
>
> #sample size
> n <- 200
>
> #coordinates
>
> x_coord <- runif(n,0,10)
> y_coord <- runif(n,0,10)
>
>
> ## w matrix and row normalized
>
> w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
>
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
> {w_raw[i,j]<- 1/(sqrt((x_coord[i]-x_coord[j])^2 + (y_coord[i]-y_coord[j])^2))^2}}
>
>
> diag(w_raw) <- rep(0,n)
>
>
> row_sum <- rep(1,length(x_coord))
> for(i in 1:length(x_coord))
>        {row_sum[i] <- sum(w_raw[i,]) }
>
> w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
>
> for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
>        {w[i,j] <- w_raw[i,j]/row_sum[i]}}
>
> x <- rbinom(n,1,.5)
>
> parms <- c(12.4,2)
> w.listw <- mat2listw(w)
>
> e <- rnorm(n,0,1)
> p <- .6
>
> y <- parms[1] + parms[2]*x + (solve(diag(n)- p*w))%*%e
>
> sim.data <- as.data.frame(cbind(y,x))
> names(sim.data) <- c("y","x")
> error.mod <- errorsarlm(y~x,data = sim.data,w.listw)
>
> summary(error.mod)
>
> X <- as.matrix(cbind(rep(1,length(sim.data$y)),sim.data$x))
>
>
> trend <- X%*%coef(error.mod)[-1]
> signal <-  error.mod$lambda*w%*%sim.data$y - error.mod$lambda*w%*%X%*%coef(error.mod)[-1]
> fit <- trend + signal
> error <-  sim.data$y - fit
>
>
> SST <- sum((sim.data$y - mean(sim.data$y))^2)
> SSE <- sum(error^2)
> SSM <- sum(((trend + signal)- mean(sim.data$y))^2)
>
>
> SSE + SSM
> SST
>
>
> Samuel H. Field, Ph.D.
> Philadelphia VA Medical Center
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no