[R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Wed Nov 22 01:33:12 CET 2017

Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:r-sig-geo-bounces at r-project.org] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: r-sig-geo at r-project.org
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?

down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#>

I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)

# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data

# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set

cheers,

Kristopher (Chris)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.