[R-sig-Geo] Find the optimal model using step wise linear regression
Nikolaos Tziokas
n|ko@@tz|ok@@ @end|ng |rom gm@||@com
Sat May 11 19:25:54 CEST 2024
I have a sf object with 2 columns, named fclass and geometry. I am
interested in the fclass column. The fclass has several unique values. I
convert this sf object to spatraster, I resample it and I perform a linear
regression where I use another spatraster, called ntl, as my response
(lm(ntl ~ road)).
My goal is to find which unique values give the largest r-squared in the lm
model using stepwise linear regression whith backward elimination. My
initial try was:
library(pacman)
pacman::p_load(terra, sf, dplyr)
ntl <- rast("path/ntl.tif") # response
v <- st_read("path/road17.shp") # sf object
# baseline r2
vterra <- vect(v)
ref <- rast("path/pop.tif") # get ext and pixel size
r <- rast(v, res = res(ref), ext = ext(ref))
x <- rasterizeGeom(vterra, r, "length", "m")
x_res <- resample(x, ntl, "average")
s <- c(ntl, x_res)
names(s) <- c("ntl", "road")
m <- lm(ntl ~ road, s, na.action = na.exclude)
baseline <- sqrt(summary(m)$adj.r.squared)
orig_baseline <- sqrt(summary(m)$adj.r.squared)
classes <- unique(v$fclass)
inclasses <- unique(v$fclass)
i <- 1
while (i <= length(classes)) {
class <- classes[i]
print(paste0("current class ", class))
print(paste0("orig baseline: ", orig_baseline, " - baseline: ", baseline))
print(classes)
v_filtered <- v[v$fclass != class, ]
vterra <- vect(v_filtered)
r <- rast(v, res = res(ref), ext = ext(ref))
x <- rasterizeGeom(vterra, r, "length", "m")
x_res <- resample(x, ntl, "average")
s <- c(ntl, x_res)
names(s) <- c("ntl", "road")
m <- lm(ntl ~ road, s, na.action = na.exclude)
class_r2 <- sqrt(summary(m)$adj.r.squared)
if (class_r2 > baseline) {
classes <- classes[-i]
baseline <- class_r2
} else {
print("class_r2 is less than baseline")
print(paste0("class_r2: ", class_r2, " - baseline: ", baseline))
i <- i + 1
}
}
but it's not efficient (it takes ~ 5 minutes).
I have found the MASS package which has the lmStepAIC function but I am not
sure how to select the unique values from the fclass column to use as
predictiors.
head(v, 6)
Simple feature collection with 6 features and 1 field
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 598675.9 ymin: 7111459 xmax: 609432.8 ymax: 7118729
Projected CRS: WGS 84 / UTM zone 35S
fclass geometry
1 secondary MULTILINESTRING ((598675.9 ...
2 secondary MULTILINESTRING ((600641.7 ...
3 residential MULTILINESTRING ((601734.8 ...
4 residential MULTILINESTRING ((601163.9 ...
5 residential MULTILINESTRING ((601104.2 ...
6 motorway_link MULTILINESTRING ((609432.8 ...
The unique values in the fclass column:
unique(v$fclass)
[1] "secondary" "residential" "motorway_link" "service"
"primary" "unclassified" "motorway"
[8] "tertiary" "trunk" "primary_link" "footway"
"track" "secondary_link" "unknown"
[15] "living_street" "pedestrian" "path" "bridleway"
"steps" "trunk_link" "track_grade1"
[22] "track_grade3" "track_grade5" "cycleway" "track_grade4"
"tertiary_lin
The dataset can be downloaded from my GoogleDrive
<https://drive.google.com/drive/folders/1a0Khhej4koA0uhYTrQmqmMg02qfRWbjt?usp=drive_link>
.
Thank you.
--
Tziokas Nikolaos
Cartographer
Tel:(+44)07561120302
LinkedIn <http://linkedin.com/in/nikolaos-tziokas-896081130>
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list