[R-sig-Geo] Comparison of prediction performance (mapping accuracy) - how to test if a method B is significantly more accurate than method A?

Thu Aug 28 17:10:22 CEST 2014

Dear list,

I'm trying to standardize a procedure to compare performance of 
competing spatial prediction methods. I know that this has been 
discussed in various literature and on various mailing lists, but I 
would be interested in any opinion I could get.

I am comparing (see below) 2 spatial prediction methods 
(regression-kriging and inverse distance interpolation) using 5-fold 
cross-validation and then testing if the difference between the two is 
significant. What I concluded is that there are two possible tests for 
the final residuals:
1. F-test to compare variances (cross-validation residuals),
2. t-test to compare mean values,

Both tests might be important, nevertheless the F-test ("var.test") 
seems to be more interesting to really be able to answer "is the method 
B significantly more accurate than method A?". It appears that the 
second test ("t.test") is only important if it fails -> which would mean 
that one of the methods systematically over or under-estimates the mean 
value (which should be 0). Did I maybe miss some important test?

Thank you!

R> library(GSIF)
R> library(gstat)
R> library(sp)
R> set.seed(2419)
R> demo(meuse, echo=FALSE)
R> omm1 <- fit.gstatModel(meuse, log1p(om)~dist+soil, meuse.grid)
Fitting a linear model...
Fitting a 2D variogram...
Saving an object of class 'gstatModel'...
R> rk1 <- predict(omm1, meuse.grid)
R> meuse.s <- meuse[!is.na(meuse$om),]
R> ok1 <- krige.cv(log1p(om)~1, meuse.s, nfold=5)
R> var.test(ok1$residual, rk1 at validation$residual, alternative = "greater")

         F test to compare two variances

data:  ok1$residual and rk1 at validation$residual
F = 1.2283, num df = 152, denom df = 152, p-value =
0.103
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
  0.9398662       Inf
sample estimates:
ratio of variances
           1.228322
R> ## No significant difference
R> t.test(ok1$residual, rk1 at validation$residual)

         Welch Two Sample t-test

data:  ok1$residual and rk1 at validation$residual
t = -0.0204, df = 300.842, p-value = 0.9837
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -0.07084667  0.06939220
sample estimates:
    mean of x    mean of y
0.0004766718 0.0012039089
R> ## Again, no significant difference

R> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
other attached packages:
[1] randomForest_4.6-7 nortest_1.0-2
[3] gstat_1.0-19       GSIF_0.4-2
[5] sp_1.0-15          gap_1.1-12