[R-sig-Geo] Fw: Why is there a large predictive difference for Univ. Kriging?

Wed Nov 22 16:17:50 CET 2017

On 2017-11-22 13:11, Joelle k. Akram wrote:
> 
> Thank you Tom. A couple questions.
> 1) In your code, you used log1p for computing  zinc.vgm. But log1p is 
> not used when defining trend.l for the krige.control 'KC'. Do we need 
> log1p for the zinc response (dependent) variable when defining trend.l?

Log-transformation in linkfit is defined by setting "lambda=0". I know 
it is a very cryptic package but it has all you need - transformation, 
back-transformation, REML fitting of variograms, trend components etc etc.

> 
> 2) The exponent back-transform is not applied anywhere after 
> applying log1p for computing the  zinc.vgm variable. Do we need exp 
> anywhere or is it done internally?

It is done internally.

> 
> 3) Do we add half the prediction variance to the 'zinc.uk' variable or 
> does geoR do this internally?

It is done internally see:
"krige.conv: performing the Box-Cox data transformation
krige.conv: back-transforming the predicted mean and variance"

> 
> 4) Is it more advisable to use likfit instead of variofit and why?

likfit has probably more options for variogram modelling (these could 
get quite computational and I think fitting vgms with >>1000 geoR is not 
recommended, while in gstat it is still doable), but it could be a 
matter of taste.

> 
> 5) A value of 800 is used to initialize likfit. Where is value determined?

Arbitrary initial parameter. It does not have to be very accurate.

> 
> appreciated,
> Chris
> 
> ------------------------------------------------------------------------
> *From:* R-sig-Geo <r-sig-geo-bounces at r-project.org> on behalf of 
> Tomislav Hengl <tom.hengl at gmail.com>
> *Sent:* November 22, 2017 3:58 AM
> *To:* r-sig-geo at r-project.org
> *Subject:* Re: [R-sig-Geo] Fw: Why is there a large predictive 
> difference for Univ. Kriging?
> 
> Hi Chris,
> 
> First of all, I think your back-transformation is not correct since you
> need to add half the prediction variance to values as indicated in the
> Diggle and Ribeiro (2007) P-61
> (https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/Diggle_Ribeiro_2007_P61.png 
> 
> 
> ).
> Otherwise you underpredict the values and hence the cross-validation
> error will be high.
> 
> I also do not see much point in using lead and copper as covariates
> since they are only available at sampling locations.
> 
> For log-normal or not-normal variables I advise using geoR package that
> does all the transformations for you (it would be interesting to see if
> gstat and geoR give exactly the same numbers if the same transformations
> and back-transformations are applied):
> 
> R> library(geoR)
> --------------------------------------------------------------
>     Analysis of Geostatistical Data
>     For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
> The geoR package - LEG-UFPR <http://www.leg.ufpr.br/geoR>
> www.leg.ufpr.br
> geoR is a free and open-source package for geostatistical analysis to be 
> used as a add-on to the R system
> 
> 
> 
>     geoR version 1.7-5.2 (built on 2016-05-02) is now loaded
> --------------------------------------------------------------
> 
> R> demo(meuse, echo=FALSE)
> R> set.seed(999)
> R> sel.d = complete.cases(meuse at data[,c("lead","copper","elev", "dist")])
> R> meuse = meuse[sel.d,]
> R> meuse.geo <- as.geodata(meuse["zinc"])
> R> ## add covariates:
> R> meuse.geo$covariate = meuse at data[,c("lead","copper","elev", "dist")]
> R> trend = ~ lead+copper+elev+dist
> R> zinc.vgm <- likfit(meuse.geo, lambda=0, trend = trend,
> ini=c(var(log1p(meuse.geo$data)),800), fix.psiA = FALSE, fix.psiR = FALSE)
> ---------------------------------------------------------------
> likfit: likelihood maximisation using the function optim.
> likfit: Use control() to pass additional
>             arguments for the maximisation function.
>            For further details see documentation for optim.
> likfit: It is highly advisable to run this function several
>            times with different initial values for the parameters.
> likfit: WARNING: This step can be time demanding!
> ---------------------------------------------------------------
> likfit: end of numerical maximisation.
> R> zinc.vgm
> likfit: estimated model parameters:
>         beta0      beta1      beta2      beta3      beta4      tausq
> sigmasq        phi       psiA
> "  6.0853" "  0.0033" "  0.0053" " -0.0810" " -0.9805" "  0.0210" "
> 0.0717" "799.9942" "  0.2619"
>          psiR
> "  3.9731"
> Practical Range with cor=0.05 for asymptotic range: 2396.568
> 
> likfit: maximised log-likelihood = -883.4
> R> locs2 = meuse at coords
> R> KC = krige.control(trend.d = trend, trend.l = ~
> meuse$lead+meuse$copper+meuse$elev+meuse$dist, obj.model = zinc.vgm)
> R> zinc.uk <- krige.conv(meuse.geo, locations=locs2, krige=KC)
> krige.conv: model with mean defined by covariates provided by the user
> krige.conv: anisotropy correction performed
> krige.conv: performing the Box-Cox data transformation
> krige.conv: back-transforming the predicted mean and variance
> krige.conv: Kriging performed using global neighbourhood
> 
> HTH,
> 
> Tom Hengl
> http://orcid.org/0000-0002-9921-5129
> Tomislav Hengl (0000-0002-9921-5129) - ORCID | Connecting ... 
> <http://orcid.org/0000-0002-9921-5129>
> orcid.org
> Your use of the Registry and the results of your search are subject to 
> ORCID's Terms and Conditions of Use
> 
> 
> 
> 
> 
> On 2017-11-22 01:08, Joelle k. Akram wrote:
>> 
>> 
>> 
>> down votefavorite<https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#> 
> 
> <https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#>
> 	
> Why is Predictive error large for Universal Kriging? 
> <https://stackoverflow.com/questions/47424740/why-is-predictive-error-large-for-universal-kriging#>
> stackoverflow.com
> I am using the Meuse dataset for universal kriging (UK) via the gstat 
> library in R. I am following a strategy used in Machine Learning where 
> data is partioned into a Train set and Hold out set. The...
> 
> 
> 
>> 
>> 
>> I am using the Meuse dataset for universal kriging (UK) via the gstat library in R. I am following a strategy used in Machine Learning where data is partioned into a Train set and Hold out set. The Train set is used for defining the regressive model and defining  the semivariogram.
>> 
>> I employ UK to predict on both the Train sample set, as well as the Hold Out sample set. However, there mean absolute error (MAE) from the predictions of the response variable (i.e., zinc for the Meuse dataset) and actual values are very different. I would  expect them to be similar or at least closer. So far I have 
> MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and 
> advice is welcome.
>> 
>> library(sp)
>> library(gstat)
>> data(meuse)
>> dataset= meuse
>> set.seed(999)
>> 
>> # Split Meuse Dataset into Training and HoldOut Sample datasets
>> Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7* nrow(dataset)))
>> 
>> Training_sample = dataset[Training_ids,]
>> Holdout_sample_allvars = dataset[-Training_ids,]
>> 
>> holdoutvars_df <-(dataset[,which(names(dataset) %in% c("x","y","lead","copper","elev","dist"))])
>> Hold_out_sample = holdoutvars_df[-Training_ids,]
>> 
>> coordinates(Training_sample) <- c('x','y')
>> coordinates(Hold_out_sample) <- c('x','y')
>> 
>> # Semivariogram modeling
>> m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample)
>> m <- vgm("Exp")
>> m <- fit.variogram(m1, m)
>> 
>> 
>> # Apply Univ Krig to Training dataset
>> prediction_training_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Training_sample, model = m)
>> prediction_training_data <- expm1(prediction_training_data$var1.pred)
>> 
>> # Apply Univ Krig to Hold Out dataset
>> prediction_holdout_data <- krige(log(zinc)~lead+copper+elev+dist, Training_sample, Hold_out_sample, model = m)
>> prediction_holdout_data <- expm1(prediction_holdout_data$var1.pred)
>> 
>> # Computing Predictive errors for Training and Hold Out samples respectively
>> training_prediction_error_term <- Training_sample$zinc - prediction_training_data
>> holdout_prediction_error_term <- Holdout_sample_allvars$zinc - prediction_holdout_data
>> 
>> 
>> 
>> # Function that returns Mean Absolute Error
>> mae <- function(error)
>> {
>>    mean(abs(error))
>> }
>> 
>> # Mean Absolute Error metric :
>> # UK Predictive errors for Training sample set , and UK Predictive Errors for HoldOut sample set
>> print(mae(training_prediction_error_term)) #Error for Training sample set
>> print(mae(holdout_prediction_error_term)) #Error for Hold out sample set
>> 
>> 
>> cheers,
>> 
>> Kristopher (Chris)
>> 
>>        [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> R-sig-Geo Info Page - SfS – Seminar for Statistics | ETH ... 
> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
> stat.ethz.ch
> A mailing list for discussing the development and use of R functions and 
> packages for handling and analysis of spatial, and particularly 
> geographical, data.
> 
> 
> 
>>
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> R-sig-Geo Info Page - SfS – Seminar for Statistics | ETH ... 
> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
> stat.ethz.ch
> A mailing list for discussing the development and use of R functions and 
> packages for handling and analysis of spatial, and particularly 
> geographical, data.
> 
> 
>