[R-sig-Geo] Using stsls() with predict.sarlm()

Mon Jul 13 14:43:27 CEST 2020

On Mon, 13 Jul 2020, Banabak, Selim wrote:

> Dear All,
>
>
> I am using the "spatialreg" package to estimate a spatial autoregressive 
> model for a large number of observations (~75.000) and would thus prefer 
> to use stsls() over lagsarlm() to gain estimation speed. In a second 
> step however, I want to use the model for out of sample prediction with 
> predict.sarlm().
>
> As predict.sarlm() is not designed for stsls() outputs I tried to modify 
> the stsls() function outputs in a way that the prediction function would 
> be able to handle them.
>
>
> Now I have two question:
>
> - Is it, from a theoretical point of view, legitimate to just use the IV
>   estimates instead of the ML estimates for the predictors described in
>   Goulard et al. (2017)?

You would need to go through the underlying equations. Just using the 
point estimates of the coefficients and their standard errors is one 
approach, but there are few studies beyond Goulard et al. Maybe look at 
Suesse and co-author https://doi.org/10.1016/j.csda.2017.11.004, 
https://doi.org/10.1080/00949655.2017.1286495 (also ML - much of the 
STSLS/GMM literature avoids looking at important problems).

While estimating and fitting larger data sets is more time-consuming with 
ML, using LU or Cholesky decomposition is practical when the spatial 
weights are sparse (this applies to STSLS too). Predicting is also 
constrained when n is large, as inverting the nxn matrix may be needed.

Check whether Stata knows how to predict from STSLS, and check usage of 
sphet::spreg rather than spatialreg::stsls. Consider contacting the 
maintainer of sphet if there is no response on this list, as it would make 
more sense to explore predict methods for more modern sphet 
implementations than legacy ones in spatialreg.

>
> - Is the following modification of the original stsls() function in
>   combination with the predict.sarlm() an acceptable approach to
>   implement the prediction methods suggested by Goulard et al. (2017)
>   with an IV estimator? (Unfortunately I could not come up with a way to
>   test this myself)
>

You would need to do the matrix math separately - maybe contacting the 
author of the spatialreg implementations, Martin Gubri, would also make 
sense.

Interesting topic!

Roger

>
>
> Modifications + test predictions in R:
>
> [modifications are marked with "## additional outputs start ##" and  "## additional outputs end ##" ]
>
>
>
> library("spdep")
> library("spatialreg")
>
> # modify stsls function outputs
> stsls2 <- function(formula, data = list(), listw, zero.policy=NULL,
>                  na.action=na.fail, robust=FALSE, HC=NULL, legacy=FALSE, W2X=TRUE) {
>  if (!inherits(listw, "listw"))
>    stop("No neighbourhood list")
>
>  if (is.null(zero.policy))
>    zero.policy <- get("zeroPolicy", envir = .spatialregOptions)
>  stopifnot(is.logical(zero.policy))
>  if (!inherits(formula, "formula")) formula <- as.formula(formula)
>  mt <- terms(formula, data = data)
>  mf <- lm(formula, data, na.action=na.action, method="model.frame")
>  na.act <- attr(mf, "na.action")
>  if (!is.null(na.act)) {
>    subset <- !(1:length(listw$neighbours) %in% na.act)
>    listw <- subset(listw, subset, zero.policy=zero.policy)
>  }
>
>  y <- model.extract(mf, "response")
>  if (any(is.na(y))) stop("NAs in dependent variable")
>  X <- model.matrix(mt, mf)
>  if (any(is.na(X))) stop("NAs in independent variable")
>  if (robust) {
>    if (is.null(HC)) HC <- "HC0"
>    if (!any(HC %in% c("HC0", "HC1")))
>      stop("HC must be one of HC0, HC1")
>  }
>  # modified to pass zero.policy Juan Tomas Sayago 100913
>  Wy <- lag.listw(listw, y, zero.policy=zero.policy)
>  dim(Wy) <- c(nrow(X),1)
>  colnames(Wy) <- c("Rho")
>
>  #    WX <- lag.listw(W,X[,2:ncol(X)])
>  n <- NROW(X)
>  m <- NCOL(X)
>  xcolnames <- colnames(X)
>  K <- ifelse(xcolnames[1] == "(Intercept)", 2, 1)
>  if (m > 1) {
>    WX <- matrix(nrow=n, ncol=(m-(K-1)))
>    if(W2X) WWX <- matrix(nrow = n, ncol = ncol(WX) )
>    for (k in K:m) {
>      wx <- lag.listw(listw, X[,k], zero.policy=zero.policy)
>      if(W2X) wwx <- lag.listw(listw, wx, zero.policy = zero.policy)
>      if (any(is.na(wx)))
>        stop("NAs in lagged independent variable")
>      WX[,(k-(K-1))] <- wx
>      if(W2X) WWX[, (k - (K - 1))] <- wwx
>    }
>    if(W2X) inst <- cbind(WX, WWX)
>    else inst <- WX
>  }
>  if (K == 2 && listw$style != "W") {
>    # modified to meet other styles, email from Rein Halbersma
>    wx1 <- as.double(rep(1, n))
>    wx <- lag.listw(listw, wx1, zero.policy=zero.policy)
>    if(W2X) wwx <- lag.listw(listw, wx, zero.policy=zero.policy)
>    if (m > 1) {
>      inst <- cbind(wx, inst)
>      if(W2X) inst <- cbind(wwx, inst)
>    } else {
>      inst <- matrix(wx, nrow=n, ncol=1)
>      if(W2X) inst <- cbind(inst, wwx)
>    }
>    #        colnames(inst) <- xcolnames
>
>  }
>  #    if (listw$style == "W") colnames(WX) <- xcolnames[-1]
>  result <- tsls(y=y, yend=Wy, X=X, Zinst=inst, robust=robust, HC=HC,
>                 legacy=legacy)
>  result$zero.policy <- zero.policy
>  result$robust <- robust
>  if (robust) result$HC <- HC
>  result$legacy <- legacy
>  result$listw_style <- listw$style
>  result$call <- match.call()
>
>  ## additional outputs ##
>  result$rho <- result$coefficients[1]
>  result$coefficients <- result$coefficients[2:length(result$coefficients)]
>  result$type <- "lag"
>  result$X <- X
>  result$y <- y
>  result$s2<-result$sse / result$df
>
>  ## additional outputs end ##
>
>  class(result) <- "stsls"
>  result
> }
>
>
> # update original function in spatialreg
> tmpfun <- get("stsls", envir = asNamespace("spatialreg"))
> environment(stsls2) <- environment(tmpfun)
> attributes(stsls2) <- attributes(tmpfun)
> assignInNamespace("stsls", stsls2, ns="spatialreg")
>
>
> #-# test prediction with updated function
>
> # data
> data(oldcol)
>
> # set seed
> set.seed(123)
>
> # draw observations to use as test data
> outsamp <- sample(1:nrow(COL.OLD), nrow(COL.OLD)*.3, replace = FALSE)
>
> # create train and test dataframes
> S.sample <- COL.OLD[-outsamp, ] # train
> O.sample <- COL.OLD[outsamp, ] # test
>
> S.sample.nb <- subset.nb(COL.nb, !(1:length(COL.nb) %in% outsamp)) # train
>
> # run regression
> testreg <- spatialreg::stsls(CRIME ~ INC + HOVAL, data=S.sample,
>                     nb2listw(S.sample.nb, style="W"))
>
> # in and outsample prediction
> S.PRED.TC <- spatialreg::predict.sarlm(object = testreg, listw = nb2listw(S.sample.nb, style="W"),
>                                       pred.type = "TC", power = FALSE)
>
>
> O.PRED.TC <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
>                                           listw = nb2listw(COL.nb, style="W"), pred.type = "TC", power = FALSE)
>
> O.PRED.BP <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
>                                           listw = nb2listw(COL.nb, style="W"), pred.type = "BP", power = FALSE)
>
>
>
>
> I would be very glad to get some feedback on both the general idea and the implementation!
>
>
> Best regards,
>
> Selim Banabak
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en