[R-sig-Geo] Find the optimal model using step wise linear regression

Nikolaos Tziokas n|ko@@tz|ok@@ @end|ng |rom gm@||@com
Sat May 11 19:25:54 CEST 2024


I have a sf object with 2 columns, named fclass and geometry. I am
interested in the fclass column. The fclass has several unique values. I
convert this sf object to spatraster, I resample it and I perform a linear
regression where I use another spatraster, called ntl, as my response
(lm(ntl ~ road)).

My goal is to find which unique values give the largest r-squared in the lm
model using stepwise linear regression whith backward elimination. My
initial try was:

library(pacman)
pacman::p_load(terra, sf, dplyr)

ntl <- rast("path/ntl.tif") # response

v <- st_read("path/road17.shp") # sf object

# baseline r2
vterra <- vect(v)

ref <- rast("path/pop.tif") # get ext and pixel size

r <- rast(v, res = res(ref), ext = ext(ref))

x <- rasterizeGeom(vterra, r, "length", "m")

x_res <- resample(x, ntl, "average")

s <- c(ntl, x_res)
names(s) <- c("ntl", "road")

m <- lm(ntl ~ road, s, na.action = na.exclude)
baseline <- sqrt(summary(m)$adj.r.squared)
orig_baseline <- sqrt(summary(m)$adj.r.squared)

classes <- unique(v$fclass)
inclasses <- unique(v$fclass)

i <- 1
while (i <= length(classes)) {
  class <- classes[i]
  print(paste0("current class ", class))
  print(paste0("orig baseline: ", orig_baseline, " - baseline: ", baseline))
  print(classes)

  v_filtered <- v[v$fclass != class, ]
  vterra <- vect(v_filtered)
  r <- rast(v, res = res(ref), ext = ext(ref))
  x <- rasterizeGeom(vterra, r, "length", "m")

  x_res <- resample(x, ntl, "average")

  s <- c(ntl, x_res)
  names(s) <- c("ntl", "road")

  m <- lm(ntl ~ road, s, na.action = na.exclude)
  class_r2 <- sqrt(summary(m)$adj.r.squared)

  if (class_r2 > baseline) {
    classes <- classes[-i]
    baseline <- class_r2
  } else {
    print("class_r2 is less than baseline")
    print(paste0("class_r2: ", class_r2, " - baseline: ", baseline))
    i <- i + 1
  }
}
but it's not efficient (it takes ~ 5 minutes).

I have found the MASS package which has the lmStepAIC function but I am not
sure how to select the unique values from the fclass column to use as
predictiors.

head(v, 6)
Simple feature collection with 6 features and 1 field
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 598675.9 ymin: 7111459 xmax: 609432.8 ymax: 7118729
Projected CRS: WGS 84 / UTM zone 35S
         fclass                       geometry
1     secondary MULTILINESTRING ((598675.9 ...
2     secondary MULTILINESTRING ((600641.7 ...
3   residential MULTILINESTRING ((601734.8 ...
4   residential MULTILINESTRING ((601163.9 ...
5   residential MULTILINESTRING ((601104.2 ...
6 motorway_link MULTILINESTRING ((609432.8 ...

The unique values in the fclass column:

unique(v$fclass)
 [1] "secondary"      "residential"    "motorway_link"  "service"
 "primary"        "unclassified"   "motorway"
 [8] "tertiary"       "trunk"          "primary_link"   "footway"
 "track"          "secondary_link" "unknown"
[15] "living_street"  "pedestrian"     "path"           "bridleway"
 "steps"          "trunk_link"     "track_grade1"
[22] "track_grade3"   "track_grade5"   "cycleway"       "track_grade4"
"tertiary_lin

The dataset can be downloaded from my GoogleDrive
<https://drive.google.com/drive/folders/1a0Khhej4koA0uhYTrQmqmMg02qfRWbjt?usp=drive_link>
.

Thank you.

-- 
Tziokas Nikolaos
Cartographer

Tel:(+44)07561120302
LinkedIn <http://linkedin.com/in/nikolaos-tziokas-896081130>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list