[R-sig-Geo] gstat now uses Lapack / failing cokriging

Edzer Pebesma edzer.pebesma at uni-muenster.de
Mon Nov 16 15:38:05 CET 2015



On 16/11/15 12:54, Bruin, Sytze de wrote:
> On 10/20/2015 05:58 AM, Edzer Pebesma wrote:
> 
>> gstat 1.1-0, now on CRAN, no longer comes with its own functions for 
>> matrix factorization and solving systems of equations [1], but now 
>> directly uses Lapack (dpotrf and dtrsm) through R's own lapack interface 
>> and R_ext/Lapack.h header files. 
> 
>> For global kriging at one location from 10,000 observations, as in 
> 
>> library(sp) 
>> library(gstat) 
>> set.seed(1331) 
>> n = 10000 
>> pts = SpatialPoints(cbind(x = runif(n), y = runif(n))) 
>> pts$z = runif(n) 
>> k <- krige(z~1, pts, pts[1,], vgm(1, "Exp", 1)) 
> 
>> I see a speed increase from 120 (gstat 1.0-26) to 46 seconds; using 
>> openblas on a 4 core laptop brings this down to 15 seconds - I expect 
>> sth similar with MKL/RevoR. 
> 
>> For local kriging on large data sets with smaller neighborhoods and many 
>> locations, I wouldn't expect large improvements; for global kriging of 
>> large data sets to many prediction locations, krige0 may be faster when 
>> you use openblas or MKL - as long as things fit in memory. 
> 
>>  I'd be happy to hear experiences (positive and negative), or otherwise 
>> reactions or questions.
> 
> Hi Edzer,
> 
> With the previous version of gstat the following code for doing cokriging using a linear model of coregionalization worked fine.
> 
> library(sp)
> library(gstat)
> 
> # some data
> x <- c(215, 330, 410, 470, 545)
> y <- c(230, 310, 330, 340, 365)
> fc <- c(0.211, 0.251, 0.281, 0.262, 0.242)
> por <- c(0.438, 0.457, 0.419, 0.430, 0.468)
> Allier <- data.frame(x, y, fc, por)
> coordinates(Allier) = ~x+y
> 
> # gstat object for co-kriging using linear model of co-regionalization
> g <- gstat(id=c("fc"), formula=fc~1, data=Allier,
>            model=vgm(0.00247, "Sph", 480, 0.00166))
> g <- gstat(g, id="por", formula=por~1, data=Allier,
>            model=vgm(0.00239, "Sph", 480, 0.00118))
> g <- gstat(g, id=c("fc", "por"),
>            model=vgm(0.00151, "Sph", 480, -0.00124))
> 
> # predict at single point
> point.krig <- predict(g, SpatialPoints(data.frame(x=450, y=350)))
> 
> However, using the recently downloaded version of gstat, I get:
> 
> Linear Model of Coregionalization found. Good.
> [using ordinary cokriging]
> Warning message:
> In predict.gstat(g, SpatialPoints(data.frame(x = 450, y = 350))) :
>   Warning: Covariance matrix (nearly) singular at location [450,350,0]: skipping...
> 
> Yet, the covariance matrix is not nearly singular. This appears to be caused by recent changes to the library.

Hi Sytze, thanks for the clear test case. The warning message is wrong
but the problem remains: the covariance matrix is not singular, but
non-positive definite:

> r
             [,1]        [,2]        [,3]        [,4]        [,5]
 [1,] 0.004130000 0.001419390 0.000895995 0.000565582 0.000224073
 [2,] 0.001419390 0.004130000 0.001839760 0.001397620 0.000879083
 [3,] 0.000895995 0.001839760 0.004130000 0.002003000 0.001423810
 [4,] 0.000565582 0.001397620 0.002003000 0.004130000 0.001865300
 [5,] 0.000224073 0.000879083 0.001423810 0.001865300 0.004130000
 [6,] 0.001510000 0.002107720 0.001787750 0.001585760 0.001376980
 [7,] 0.002107720 0.001510000 0.002364710 0.002094420 0.001777420
 [8,] 0.001787750 0.002364710 0.001510000 0.002464510 0.002110430
 [9,] 0.001585760 0.002094420 0.002464510 0.001510000 0.002380320
[10,] 0.001376980 0.001777420 0.002110430 0.002380320 0.001510000
             [,6]        [,7]        [,8]        [,9]       [,10]
 [1,] 0.001510000 0.002107720 0.001787750 0.001585760 0.001376980
 [2,] 0.002107720 0.001510000 0.002364710 0.002094420 0.001777420
 [3,] 0.001787750 0.002364710 0.001510000 0.002464510 0.002110430
 [4,] 0.001585760 0.002094420 0.002464510 0.001510000 0.002380320
 [5,] 0.001376980 0.001777420 0.002110430 0.002380320 0.001510000
 [6,] 0.003570000 0.001373420 0.000866975 0.000547264 0.000216816
 [7,] 0.001373420 0.003570000 0.001780170 0.001352350 0.000850611
 [8,] 0.000866975 0.001780170 0.003570000 0.001938130 0.001377690
 [9,] 0.000547264 0.001352350 0.001938130 0.003570000 0.001804880
[10,] 0.000216816 0.000850611 0.001377690 0.001804880 0.003570000
> chol(r)
Error in chol.default(r) :
  the leading minor of order 9 is not positive definite

As explained in https://en.wikipedia.org/wiki/Cholesky_decomposition ,
Choleski differs from LDL', which gstat used to do before 1.1-0. As Tim
Peterson reminded me off-list, lapack also has LDL'; will look into it.

Changing the nugget of the cross variogram to -0.001 makes the matrix
positive definite, for your toy case.

> 
> Regards,
> 
> Sytze de Bruin
> Wageningen University
> Laboratory of Geo-Information Science and Remote Sensing 
> Telephone: +31 317 481830
> Mobile: +31 6 13898626
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi),  University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20151116/14982b8f/attachment.bin>


More information about the R-sig-Geo mailing list