[R-sig-Geo] Using stsls() with predict.sarlm()

Mon Jul 13 11:38:05 CEST 2020

Dear All,

I am using the "spatialreg" package to estimate a spatial autoregressive model for a large number of observations (~75.000) and would thus prefer to use stsls() over lagsarlm() to gain estimation speed. In a second step however, I want to use the model for out of sample prediction with predict.sarlm().

As predict.sarlm() is not designed for stsls() outputs I tried to modify the stsls() function outputs in a way that the prediction function would be able to handle them.

Now I have two question:

- Is it, from a theoretical point of view, legitimate to just use the IV estimates instead of the ML estimates for the predictors described in

Goulard et al. (2017)?

- Is the following modification of the original stsls() function in combination with the predict.sarlm() an acceptable approach to implement the prediction methods suggested by Goulard et al. (2017) with an IV estimator? (Unfortunately I could not come up with a way to test this myself)

Modifications + test predictions in R:

[modifications are marked with "## additional outputs start ##" and  "## additional outputs end ##" ]

library("spdep")
library("spatialreg")

# modify stsls function outputs
stsls2 <- function(formula, data = list(), listw, zero.policy=NULL,
                  na.action=na.fail, robust=FALSE, HC=NULL, legacy=FALSE, W2X=TRUE) {
  if (!inherits(listw, "listw"))
    stop("No neighbourhood list")

  if (is.null(zero.policy))
    zero.policy <- get("zeroPolicy", envir = .spatialregOptions)
  stopifnot(is.logical(zero.policy))
  if (!inherits(formula, "formula")) formula <- as.formula(formula)
  mt <- terms(formula, data = data)
  mf <- lm(formula, data, na.action=na.action, method="model.frame")
  na.act <- attr(mf, "na.action")
  if (!is.null(na.act)) {
    subset <- !(1:length(listw$neighbours) %in% na.act)
    listw <- subset(listw, subset, zero.policy=zero.policy)
  }

  y <- model.extract(mf, "response")
  if (any(is.na(y))) stop("NAs in dependent variable")
  X <- model.matrix(mt, mf)
  if (any(is.na(X))) stop("NAs in independent variable")
  if (robust) {
    if (is.null(HC)) HC <- "HC0"
    if (!any(HC %in% c("HC0", "HC1")))
      stop("HC must be one of HC0, HC1")
  }
  # modified to pass zero.policy Juan Tomas Sayago 100913
  Wy <- lag.listw(listw, y, zero.policy=zero.policy)
  dim(Wy) <- c(nrow(X),1)
  colnames(Wy) <- c("Rho")

  #    WX <- lag.listw(W,X[,2:ncol(X)])
  n <- NROW(X)
  m <- NCOL(X)
  xcolnames <- colnames(X)
  K <- ifelse(xcolnames[1] == "(Intercept)", 2, 1)
  if (m > 1) {
    WX <- matrix(nrow=n, ncol=(m-(K-1)))
    if(W2X) WWX <- matrix(nrow = n, ncol = ncol(WX) )
    for (k in K:m) {
      wx <- lag.listw(listw, X[,k], zero.policy=zero.policy)
      if(W2X) wwx <- lag.listw(listw, wx, zero.policy = zero.policy)
      if (any(is.na(wx)))
        stop("NAs in lagged independent variable")
      WX[,(k-(K-1))] <- wx
      if(W2X) WWX[, (k - (K - 1))] <- wwx
    }
    if(W2X) inst <- cbind(WX, WWX)
    else inst <- WX
  }
  if (K == 2 && listw$style != "W") {
    # modified to meet other styles, email from Rein Halbersma
    wx1 <- as.double(rep(1, n))
    wx <- lag.listw(listw, wx1, zero.policy=zero.policy)
    if(W2X) wwx <- lag.listw(listw, wx, zero.policy=zero.policy)
    if (m > 1) {
      inst <- cbind(wx, inst)
      if(W2X) inst <- cbind(wwx, inst)
    } else {
      inst <- matrix(wx, nrow=n, ncol=1)
      if(W2X) inst <- cbind(inst, wwx)
    }
    #        colnames(inst) <- xcolnames

  }
  #    if (listw$style == "W") colnames(WX) <- xcolnames[-1]
  result <- tsls(y=y, yend=Wy, X=X, Zinst=inst, robust=robust, HC=HC,
                 legacy=legacy)
  result$zero.policy <- zero.policy
  result$robust <- robust
  if (robust) result$HC <- HC
  result$legacy <- legacy
  result$listw_style <- listw$style
  result$call <- match.call()

  ## additional outputs ##
  result$rho <- result$coefficients[1]
  result$coefficients <- result$coefficients[2:length(result$coefficients)]
  result$type <- "lag"
  result$X <- X
  result$y <- y
  result$s2<-result$sse / result$df

  ## additional outputs end ##

  class(result) <- "stsls"
  result
}

# update original function in spatialreg
tmpfun <- get("stsls", envir = asNamespace("spatialreg"))
environment(stsls2) <- environment(tmpfun)
attributes(stsls2) <- attributes(tmpfun)
assignInNamespace("stsls", stsls2, ns="spatialreg")

#-# test prediction with updated function

# data
data(oldcol)

# set seed
set.seed(123)

# draw observations to use as test data
outsamp <- sample(1:nrow(COL.OLD), nrow(COL.OLD)*.3, replace = FALSE)

# create train and test dataframes
S.sample <- COL.OLD[-outsamp, ] # train
O.sample <- COL.OLD[outsamp, ] # test

S.sample.nb <- subset.nb(COL.nb, !(1:length(COL.nb) %in% outsamp)) # train

# run regression
testreg <- spatialreg::stsls(CRIME ~ INC + HOVAL, data=S.sample,
                     nb2listw(S.sample.nb, style="W"))

# in and outsample prediction
S.PRED.TC <- spatialreg::predict.sarlm(object = testreg, listw = nb2listw(S.sample.nb, style="W"),
                                       pred.type = "TC", power = FALSE)

O.PRED.TC <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
                                           listw = nb2listw(COL.nb, style="W"), pred.type = "TC", power = FALSE)

O.PRED.BP <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
                                           listw = nb2listw(COL.nb, style="W"), pred.type = "BP", power = FALSE)

I would be very glad to get some feedback on both the general idea and the implementation!

Best regards,

Selim Banabak

	[[alternative HTML version deleted]]