[R-sig-Geo] Spatial random forest prediction: Error when predicting at unseen locations at finer spatial scale

Nikolaos Tziokas n|ko@@tz|ok@@ @end|ng |rom gm@||@com
Wed Feb 22 21:36:21 CET 2023


I am using the package spatialRF in R for a spatial random forest
regression (SRFR) task. I have one response variable and 4 predictors and I
am performing SRFR at a coarse spatial scale. My goal is to take the model
parameters and apply them to a finer spatial resolution in order to predict
the response variable at the finer spatial scale.

When I run

p <- stats::predict(object = model.spatial,           #name of the
spatialRF model
                    data = s,                         # data.frame
containing the predictors at the fine spatial scale (without NaN values)
                    type = "response")$predictions
I am getting this error: Error in predict.ranger.forest(forest, data,
predict.all, num.trees, type,: Error: One or more independent variables not
found in data.

I have checked the column names of s and my original data.frame (the one I
used to build the model at the coarse scale) and they are the same. How can
I use the model i created at the coarse scale to predict the response
variable at a finer spatial scale?

Here is the code:

library(spatialRF)
library(stats)

wd = "path/"

block.data = read.csv(paste0(wd, "block.data.csv")) # coarse resolution

#names of the response variable and the predictors
dependent.variable.name <- "ntl"
predictor.variable.names <- colnames(block.data)[4:7]

#coordinates of the cases
xy <- block.data[, c("x", "y")]

block.data$x <- NULL
block.data$y <- NULL

#distance matrix
distance.matrix <- as.matrix(dist(block.data))
min(distance.matrix)
max(distance.matrix)

#distance thresholds (same units as distance_matrix)
distance.thresholds <- c(0, 20, 50, 100, 200, 500)

#random seed for reproducibility
random.seed <- 456

#creating and registering the cluster
    local.cluster <- parallel::makeCluster(
      parallel::detectCores() - 1,
      type = "PSOCK")
    doParallel::registerDoParallel(cl = local.cluster)

# fitting a non-spatial Random Forest
model.non.spatial <- spatialRF::rf(
data = block.data,
dependent.variable.name = dependent.variable.name,
predictor.variable.names = predictor.variable.names,
distance.matrix = distance.matrix,
distance.thresholds = distance.thresholds,
xy = xy,
seed = random.seed,
verbose = FALSE)

# Fitting a spatial model with rf_spatial()
model.spatial <- spatialRF::rf_spatial(
  model = model.non.spatial,
  method = "mem.moran.sequential",
  verbose = FALSE,
  seed = random.seed)

#stopping the cluster
parallel::stopCluster(cl = local.cluster)

# prediction at a finer spatial scale
s = read.csv(paste0(wd, "s.csv")) # df containg the predictors at fine
scale

p <- stats::predict(object = model.spatial,
                    data = s,
                    type = "response")$predictions

I tried solutions like:

levels(s$lc) <- levels(block.data$lc)

in case I had missing land cover types in the lc column between the spatial
scales, but it didn't work.

>From here
<https://drive.google.com/drive/folders/1KhnQEajpSKh59XuWkxTZcc_2YxPcxYW7?usp=sharing>
you can download the two data.frames.

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list