[R-sig-Geo] Fw: Why is there a large predictive difference for Univ. Kriging?

Joelle k. Akram chino_tones at hotmail.com
Wed Nov 22 01:08:22 CET 2017




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets
Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,]
Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)
m <- vgm("Exp")
m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m)
prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m)
prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively
training_prediction_error_term <- Training_sample$zinc - prediction_training_data
holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error
mae <- function(error)
{
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list