mgwrsar package estimates linear and local linear models with or without spatial autocorrelation. However this vignette is mainly dedicated to canonical GWR and mixed GWR without spatial autocorrelation (see other vignettes for models with spatial autocorrelation):
\(y=\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (GWR)\)
\(y=\beta_c X_c+\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (Mixed-GWR)\)
For a comprehensive understanding of the estimation method, please
refer to:
Geniaux, G. and Martinetti, D. (2017). A new method
for dealing simultaneously with spatial autocorrelation and spatial
heterogeneity in regression models. Regional Science and Urban
Economics.
DOI:
10.1016/j.regsciurbeco.2017.04.001
The estimation of the aforementioned model can be performed using the
MGWRSAR wrapper function, which requires:
- A dataframe and a matrix of coordinates, or
- Objects such as SpatialPointsDataFrame
,
SpatialGridDataFrame
, SpatialPixelsDataFrame
,
or an sf
dataframe.
golden_search_bandwidth
function can be used to
estimate optimal bandwidths, considering either AICc or
Leave-One-Out Cross Validation (LOOCV).gauss
),
Epanechnikov (epane
), Bisquare (bisq
),
Trisquare (trisq
), Triangle (triangle
), and
Rectangle (rectangle
).Type='GD'
) or space-time coordinates
(Type='GDT'
).mgwrsar
package supports the estimation of
temporal GWR/Mixed-GWR models using the
Type='GDT'
kernel.doParallel
package.mgwrsar
models can be estimated using a
Target Points set, with a method for selecting an
optimal set of target points (based on a specified size) to enhance
computational efficiency.mgwrsar
PackageThe MGWRSAR function requires either:
sf
dataframe with embedded coordinates.For this example, we use the mydatasf
dataset, which
contains real estate data (single houses with land
plots) from the south of France. The dataset includes the
price and four covariates:
library(mgwrsar)
#> Loading required package: Rcpp
#> Loading required package: sp
#> Loading required package: leaflet
#> Loading required package: Matrix
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## loading data example
data(mydatasf)
plot(mydatasf['price'],pch=19)
Estimating a canonical GWR or Mixed-GWR model requires selecting an optimal bandwidth for the chosen kernel. The mgwrsar package provides a wrapper that simplifies the process of identifying the optimal bandwidth across various model and kernel types, using either AICc or cross-validation (CV) criteria. While primarily designed for spatial kernels (Type=‘GD’), it can also be applied to space-time kernels (Type=‘GDT’) by specifying a fixed bandwidth for the temporal dimension.
The following example demonstrates how to search for the optimal bandwidth for GWR and MGWR (with a stationary intercept) models, using both AICc and CV criteria for adaptive and non-adaptive Gaussian kernels.
####
res_AIC_non_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('bisq'), Model = 'GWR',control=list(adaptive=F,criterion='AICc',verbose=F),lower.bound=100, upper.bound=50000,tolerance=0.001)
#> .
#> Warning non adapative bisq kernel leads to 2 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 4 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 25 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 51 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 90 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 36 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 67 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 45 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 48 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 55 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
res_AIC_non_adpative$minimum # 341.33
#> [1] 341.3378
res_AIC_non_adpative$ctime # 78.557
#> elapsed
#> 84.182
model_GWR1<-MGWRSAR(formula='price~year+footage+land_area',H=res_AIC_non_adpative$minimum,data = mydata,coords=coords, fixed_vars=NULL,kernels=c('bisq'),Model='GWR',control=list(adaptive=F,verbose=F,SE=T))
#> .
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
model_GWR1@ctime # 1.989
#> elapsed
#> 1.807
summary(model_GWR1)
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata,
#> coords = coords, fixed_vars = NULL, kernels = c("bisq"),
#> H = res_AIC_non_adpative$minimum, Model = "GWR", control = list(adaptive = F,
#> verbose = F, SE = T))
#> Model: GWR
#> Kernels function: bisq
#> Kernels adaptive: NO
#> Kernels type: GD
#> Bandwidth: 341.3378
#> Computation time: 1.807
#> Use of parallel computing: FALSE ncore = 1
#> Use of rough kernel: NO
#> Use of Target Points: NO
#> Number of data points: 1403
#> Varying parameters: Intercept year footage land_area
#> Intercept year footage land_area
#> Min. -40292348.3 -97696.7 -11829.7 -1361.880
#> 1st Qu. -9713.8 1174.6 1285.0 18.824
#> Median 45382.9 4151.6 1667.0 41.015
#> Mean -55265.9 6614.2 2428.1 62.151
#> 3rd Qu. 78400.7 7458.8 2267.8 61.864
#> Max. 1111359.0 989116.5 278536.4 758.498
#> Effective degrees of freedom: 1060.233
#> AICc: 35900.44
#> Residual sum of squares: 1.063749e+13
#> RMSE: 87074.43
res_AIC_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='AICc',verbose=F,NN=500),lower.bound=5, upper.bound=500,tolerance=0.001)
res_AIC_adpative$minimum # 12
#> [1] 12
res_AIC_adpative$ctime # 11.049
#> elapsed
#> 8.601
model_GWR1<-res_AIC_adpative$model
summary(res_AIC_adpative$model)
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata,
#> coords = coords, fixed_vars = fixed_vars, kernels = c("gauss"),
#> H = c(res, H2), Model = Model, control = list(adaptive = T,
#> criterion = "AICc", verbose = F, NN = 500))
#> Model: GWR
#> Kernels function: gauss
#> Kernels adaptive: YES
#> Kernels type: GD
#> Bandwidth: 12
#> Computation time: 0.915
#> Use of parallel computing: FALSE ncore = 1
#> Use of rough kernel: YES, 500 neighbors / 1403
#> Use of Target Points: NO
#> Number of data points: 1403
#> Varying parameters: Intercept year footage land_area
#> Intercept year footage land_area
#> Min. -535158.898 -10860.084 20.124 -25.810
#> 1st Qu. -23589.567 1559.144 1216.048 21.500
#> Median 41474.377 4004.207 1707.279 39.208
#> Mean 20638.185 4609.292 1854.983 87.229
#> 3rd Qu. 80582.162 6694.403 2340.756 65.917
#> Max. 335707.649 28716.374 4799.328 1696.391
#> Effective degrees of freedom: 1153.829
#> AICc: 36676.72
#> Residual sum of squares: 1.201268e+13
#> RMSE: 92531.8
#> Warning non adapative bisq kernel leads to 54 isolated points. OLS estimates are used for these points. It is recommended to use an adaptive or Gaussian kernel instead.
## optimal bandwidth using CV criteria
# optimization
resGWR_CV_non_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=F,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=50000,tolerance=0.001)
resGWR_CV_non_adpative$minimum # 449.1597
#> [1] 449.1597
resGWR_CV_non_adpative$ctime # 13.411
#> elapsed
#> 30.705
model_GWR1<-resGWR_CV_non_adpative$model
# summary of optimal model
summary(model_GWR1)
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata,
#> coords = coords, fixed_vars = fixed_vars, kernels = c("gauss"),
#> H = c(res, H2), Model = Model, control = list(adaptive = F,
#> criterion = "CV", verbose = F, NN = 500, doMC = F, ncore = 1))
#> Model: GWR
#> Kernels function: gauss
#> Kernels adaptive: NO
#> Kernels type: GD
#> Bandwidth: 449.1597
#> Computation time: 0.69
#> Use of parallel computing: FALSE ncore = 1
#> Use of rough kernel: YES, 500 neighbors / 1403
#> Use of Target Points: NO
#> Number of data points: 1403
#> Varying parameters: Intercept year footage land_area
#> Intercept year footage land_area
#> Min. -487681.47 -6487.68 -451.26 -60.212
#> 1st Qu. -8997.79 2338.39 1583.00 36.349
#> Median 34566.69 3318.67 1778.99 39.739
#> Mean 10746.10 4600.36 1984.20 44.414
#> 3rd Qu. 52296.10 5859.57 2277.26 42.348
#> Max. 259184.65 23426.70 4643.83 185.571
#> Effective degrees of freedom: 1303.761
#> AICc: 36722.1
#> Residual sum of squares: 1.641726e+13
#> RMSE: 108173.7
plot(model_GWR1,type='B_coef',var='Intercept',mytile='OpenStreetMap',crs=mycrs,radius=150,myzoom=11)
## if we want the computation of Standard Error of coefficients, we can restimate the model using (SE=T) in control.
model_GWR1<-MGWRSAR(formula = 'price~year+footage+land_area', data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'),H=resGWR_CV_non_adpative$minimum, Model = 'GWR',control=list(adaptive=F,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1,SE=T))
summary(model_GWR1)
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata,
#> coords = coords, fixed_vars = NULL, kernels = c("gauss"),
#> H = resGWR_CV_non_adpative$minimum, Model = "GWR", control = list(adaptive = F,
#> criterion = "CV", verbose = F, NN = 500, doMC = F, ncore = 1,
#> SE = T))
#> Model: GWR
#> Kernels function: gauss
#> Kernels adaptive: NO
#> Kernels type: GD
#> Bandwidth: 449.1597
#> Computation time: 0.984
#> Use of parallel computing: FALSE ncore = 1
#> Use of rough kernel: YES, 500 neighbors / 1403
#> Use of Target Points: NO
#> Number of data points: 1403
#> Varying parameters: Intercept year footage land_area
#> Intercept year footage land_area
#> Min. -487681.47 -6487.68 -451.26 -60.212
#> 1st Qu. -8997.79 2338.39 1583.00 36.349
#> Median 34566.69 3318.67 1778.99 39.739
#> Mean 10746.10 4600.36 1984.20 44.414
#> 3rd Qu. 52296.10 5859.57 2277.26 42.348
#> Max. 259184.65 23426.70 4643.83 185.571
#> Effective degrees of freedom: 1303.761
#> AICc: 36722.1
#> Residual sum of squares: 1.641726e+13
#> RMSE: 108173.7
## optimal bandwidth using AICc criteria
resMGWR_CV_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',H2=NULL,data = mydata,coords=coords, fixed_vars='year',kernels=c('gauss'), Model = 'MGWR',control=list(verbose=F,NN=500,criterion='CV',adaptive=T,doMC=F,ncore=1),lower.bound=6, upper.bound=502,tolerance=0.001)
resMGWR_CV_adpative$minimum # 8
#> [1] 42
model_MGWR<-resMGWR_CV_adpative$model
## summary of optimal model
summary(model_MGWR)
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata,
#> coords = coords, fixed_vars = fixed_vars, kernels = c("gauss"),
#> H = c(res, H2), Model = Model, control = list(verbose = F,
#> NN = 500, criterion = "CV", adaptive = T, doMC = F, ncore = 1))
#> Model: MGWR
#> Kernels function: gauss
#> Kernels adaptive: YES
#> Kernels type: GD
#> Bandwidth: 42
#> Computation time: 1.06
#> Use of parallel computing: FALSE ncore = 1
#> Use of rough kernel: YES, 500 neighbors / 1403
#> Use of Target Points: NO
#> Number of data points: 1403
#> Constant parameters: XX
#> 4477.57
#> Varying parameters: Intercept footage land_area
#> Intercept footage land_area
#> Min. -124680.32 937.04 12.729
#> 1st Qu. -593.53 1466.60 33.800
#> Median 23206.29 1788.97 38.840
#> Mean 19093.52 1866.14 74.346
#> 3rd Qu. 47096.39 2073.61 50.753
#> Max. 116319.85 3726.99 1016.451
#> Effective degrees of freedom: 1342.651
#> AICc: 36768.38
#> Residual sum of squares: 1.806056e+13
#> RMSE: 113458.4
The mgwrsar_bootstrap_test function performs a bootstrap test for Beta coefficients in mgwrsar class models, with support for parallel computation. As this process can be time-consuming, it is not executed in this example.
# library(fda.usc) # for wild boostrap
D=as.matrix(dist(coords))
# Stationnarity Test should be conducted using optimal bandwidth
resGWR=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=502,tolerance=0.001)
model_GWR<-resGWR$model
resMGWR=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars='year',kernels=c('gauss'), Model = 'MGWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=502,tolerance=0.001)
model_MGWR<-resMGWR$model
model_GWR@AICc
model_MGWR@AICc
model_GWR@RMSE
model_MGWR@RMSE
test<-mgwrsar_bootstrap_test(model_MGWR,model_GWR,B=30,doMC=F,ncore=1,type='spatial',eps='H1',df='H1',focal='median',D=D)
test
#$p_star 0.2 --> we can not reject HO --> year is not varying over space
#$T -0.107674
# Significance Test of covariate "year"
resGWRc=golden_search_bandwidth(formula='price~footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=502,tolerance=0.001)
model_GWRc<-resGWRc$model
model_GWRc@AICc
model_GWR@AICc
model_GWRc@RMSE
model_GWR@RMSE
test<-mgwrsar_bootstrap_test(model_GWRc,model_GWR,B=30,doMC=F,ncore=1,type='spatial',eps='H1',df='H1',focal='median',D=D)
test
#$p_star 0 --> we can reject HO --> "year" is significant
#$T 3.82697
# Significance Test of covariate "random"
set.seed=1234
mydata$random=rnorm(nrow(coords))
resGWRnc=golden_search_bandwidth(formula='price~year+footage+land_area+random',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=502,tolerance=0.001)
model_GWRnc<-resGWRnc$model
model_GWRnc@AICc
model_GWR@AICc
model_GWRnc@RMSE
model_GWR@RMSE
test<-mgwrsar_bootstrap_test(model_GWR,model_GWRnc,B=30,doMC=F,ncore=1,type='spatial',eps='H1',df='H1',focal='median',D=D)
test
#$p_star 0.3666667 --> we can not reject HO --> "random" is not significant
#$T 1.756313
In this example, we use an in-sample of 800 observations for model estimation and an out-sample of 200 observations for prediction. Predictions for GWR and MGWR can be made either by extrapolation (faster) or by re-estimating the model using out-sample locations as focal points (more accurate).
For GWR and Mixed-GWR models, the most accurate approach for out-of-sample prediction is to re-estimate the model coefficients using out-sample locations as focal points. In this case, the estimated coefficients themselves are not used; only the model call and input data are employed. Alternatively, coefficients can be extrapolated using a weighting matrix, which is the only available method for models incorporating spatial autocorrelation (e.g., MGWR_1_0_kv, MGXWR_0_0_kv, MGXWR_1_kc_kv, MGXWR_1_kc_0) (see the vignette GWR-and-Mixed-GWR-with-spatial-autocorrelation.Rmd for details). Users can then also choose to extrapolate model coefficients using a Shepard kernel with a specified number of neighbors (method_pred=‘shepard’, k_extra=12), apply the same kernel and bandwidth as the estimated model (method_pred=‘model’), or use the method proposed by Geniaux (2024) (method_pred=‘tWtp_model’), which is the default when re-estimating model coefficients using out-sample locations as focal points is not possible.
# build insample and outsample folds
length_out=1200
index_in=sample(1:nrow(mydata),length_out)
index_out=(1:nrow(mydata))[-index_in]
# estimate a GWR model on an insample fold
res_AIC_non_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata[index_in,],coords=coords[index_in,], fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=500,tolerance=0.001)
res_AIC_non_adpative$minimum
#> [1] 36
model_GWR_insample1<-res_AIC_non_adpative$model
# build outsample data
newdata=mydata[index_out,]
newdata_coords=coords[index_out,]
## prediction of the GWR model on the newdata
## restimate the model coefficient using locations of out-sample as focal points
Y_pred1=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords)
head(Y_pred1)
#> [1] 158422.7 159122.2 273286.3 202679.0 154033.9 181500.4
head(mydata$price[index_out])
#> [1] 199000 165000 326000 265000 182000 305000
sqrt(mean((mydata$price[index_out]-Y_pred1)^2)) # RMSE 110096
#> [1] 110096
## Prediction with method_pred='tWtp'
Y_pred2=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred2)
#> [1] 149887.6 154460.5 261276.1 190782.5 156923.0 175888.6
head(mydata$price[index_out])
#> [1] 199000 165000 326000 265000 182000 305000
sqrt(mean((mydata$price[index_out]-Y_pred2)^2)) # RMSE 115043
#> [1] 115043
## Prediction with method_pred='model'
Y_pred3=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords,method_pred='model')
head(Y_pred3)
#> [1] 169177.7 158614.4 297628.6 199072.6 164298.7 180863.5
head(mydata$price[index_out])
#> [1] 199000 165000 326000 265000 182000 305000
sqrt(mean((mydata$price[index_out]-Y_pred3)^2)) # RMSE 180813.1
#> [1] 180813.1
## Prediction with method_pred='shepard'
Y_pred4=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords,method_pred='shepard')
head(Y_pred4)
#> [1] 158087.2 159010.8 269226.3 202717.5 154604.1 181443.6
head(mydata$price[index_out])
#> [1] 199000 165000 326000 265000 182000 305000
sqrt(mean((mydata$price[index_out]-Y_pred4)^2)) # RMSE 111476
#> [1] 111476
### Model Mixed-GWR (wrong model...)
resMGWR_CV_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata[index_in,],coords=coords[index_in,], fixed_vars='year',kernels=c('gauss'), Model = 'MGWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,doMC=F,ncore=1),lower.bound=5, upper.bound=500,tolerance=0.001)
res_AIC_non_adpative$minimum
#> [1] 36
model_MGWR_insample2<-resMGWR_CV_adpative$model
## restimate the model coefficient using locations of out-sample as focal points
Y_pred5=predict(model_MGWR_insample2, newdata=newdata, newdata_coords=newdata_coords)
#> Warnings: method_pred=='TP' is not implemented for Model in ('MGWR','MGWRSAR_1_kc_0','MGWRSAR_1_kc_kv','MGWRSAR_0_kc_kv','MGWRSAR_0_0_kv'): automatic swith to method_pred='tWtp_model'
head(Y_pred5)
#> [1] 4387021.4 336403.2 9486558.4 1925896.3 2086548.6 727895.3
head(mydata$price[index_out])
#> [1] 199000 165000 326000 265000 182000 305000
sqrt(mean((mydata$price[index_out]-Y_pred5)^2)) # RMSE 8148959
#> [1] 8148959
## extrapolation of beta coefs with a shepard kernel
Y_pred6=predict(model_MGWR_insample2, newdata=newdata, newdata_coords=newdata_coords,method_pred='shepard')
head(Y_pred6)
#> [1] 4405202.5 340970.8 9495151.3 1967180.0 2078633.9 743689.7
head(mydata$price[index_out])
#> [1] 199000 165000 326000 265000 182000 305000
sqrt(mean((mydata$price[index_out]-Y_pred6)^2)) # RMSE 8150579
#> [1] 8150579
Brunsdon, C., Fotheringham, A., Charlton, M., 1996. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28 (4), 281–298.
Friedman, J. H. 1984. A variable span smoother. Laboratory for Computational Statistics, Department of Statistics, Stanford University: Technical Report(5).
Geniaux, G. and Martinetti, D. 2018. A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics, 72, 74-85. https://doi.org/10.1016/j.regsciurbeco.2017.04.001
Geniaux, G. and Martinetti, D. 2017b. R package mgwrsar: Functions for computing (mixed) geographycally weighted regression with spatial autocorrelation,. https://CRAN.R-project.org/package=mgwrsar.
Geniaux, G. 2024. Speeding up estimation of spatially varying coefficients models. Jounal of Geographical System 26, 293–327. https://doi.org/10.1007/s10109-024-00442-3
Goulard, M., Laurent, T., & Thomas-Agnan, C., 2017, “About predictions in spatial autoregressive models: Optimal and almost optimal strategies,” published in Spatial Economic Analysis, 12(2-3), 304-325
Loader, C. 1999. Local regression and likelihood, volume 47. springer New York.
McMillen, D. P. 1996. One hundred fifty years of land values in chicago: A nonparametric approach. Journal of Urban Economics, 40(1):100 – 124.