[R-sig-Geo] [DKIM] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]

Joelle k. Akram chino_tones at hotmail.com
Wed Nov 22 02:12:43 CET 2017


no problem Jin. I am looking a for regression model that is transparent, i.e., where I can obtain the regression fitting coefficients (beta) for each covariate. Do you recommend any in spm to use?

Also which you do think from your experience, will have a similar predictive performance (MAE) for both the training sample set, as well as, the hold-out sample test set?

cheers,
Chris
________________________________
From: Li Jin <Jin.Li at ga.gov.au>
Sent: November 21, 2017 6:07 PM
To: Joelle k. Akram; r-sig-geo at r-project.org
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]


They are not yet.



From: Joelle k. Akram [mailto:chino_tones at hotmail.com]
Sent: Wednesday, 22 November 2017 11:56 AM
To: Li Jin; r-sig-geo at r-project.org
Subject: [DKIM] Re: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Jin,



thank you for sharing. I was having a read of your paper:"Application of machine learning methods to spatial interpolation of environmental variables " of which the spm package is based.



In Table 1 from the paper you compare many algorithms. I was interested in assessing RKglm, RKgls, RKlm. Are these available in spm?



thanks

Chris



________________________________

From: Li Jin <Jin.Li at ga.gov.au<mailto:Jin.Li at ga.gov.au>>
Sent: November 21, 2017 5:33 PM
To: Joelle k. Akram; r-sig-geo at r-project.org<mailto:r-sig-geo at r-project.org>
Subject: RE: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging? [SEC=UNCLASSIFIED]



Hi Chris,
The UK used here is usually called kriging with an external drift (KED). It, in fact, is a linear model plus kriging, which assumes linear relationship that is usually not true. It has been tested in several studies and was outperformed by machine learning methods like RF, RFOK, RFIDW etc. I have release an R package, spm, to introduce these methods. It is easy to use as demonstrated in vignette('spm').
Hope this helps.
Regards,
Jin

-----Original Message-----
From: R-sig-Geo [mailto:r-sig-geo-bounces at r-project.org] On Behalf Of Joelle k. Akram
Sent: Wednesday, 22 November 2017 11:08 AM
To: r-sig-geo at r-project.org<mailto:r-sig-geo at r-project.org>
Subject: [DKIM] [R-sig-Geo] Fw: Why is there a large predictive difference forUniv. Kriging?




down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging>>


I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining the semivariogram.

I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would expect them to be similar or at least closer. So far I have MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and advice is welcome.

library(sp)
library(gstat)
data(meuse)
dataset= meuse
set.seed(999)

# Split Meuse Dataset into Training and HoldOut Sample datasets Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))

Training_sample = dataset[Training_ids,] Holdout_sample_allvars = dataset[-Training_ids,]

holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
Hold_out_sample = holdoutvars_df[-Training_ids,]

coordinates(Training_sample) <- c('x','y')
coordinates(Hold_out_sample) <- c('x','y')

# Semivariogram modeling
m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m <- vgm("Exp") m <- fit.variogram(m1, m)


# Apply Univ Krig to Training dataset
prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m) prediction_training_data <- expm1(prediction_training_data$var1.pred)

# Apply Univ Krig to Hold Out dataset
prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m) prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)

# Computing Predictive errors for Training and Hold Out samples respectively training_prediction_error_term <- Training_sample$zinc - prediction_training_data holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data



# Function that returns Mean Absolute Error mae <- function(error) {
  mean(abs(error))
}

# Mean Absolute Error metric :
# UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
print(mae(training_prediction_error_term)) #Error for Training sample set
print(mae(holdout_prediction_error_term)) #Error for Hold out sample set


cheers,

Kristopher (Chris)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org<mailto:R-sig-Geo at r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.
-------------------------------------------------------------------------------------------------------------------------

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list