[R-sig-Geo] why are idw and gstat different? [SEC=UNCLASSIFIED]
Edzer Pebesma
edzer.pebesma at uni-muenster.de
Wed Jul 9 09:27:25 CEST 2008
Jin.Li at ga.gov.au wrote:
>
> Edzer,
>
> I guess what I mean by “best guess” is that there should possibly be a
> reasonable value for nmax as we know that samples located beyond the
> range are expected to have little contribution to the estimation.
>
This is only true if (i) the relative nugget is small, and (ii) there is
no regression going on in the trend. But where should the threshold be
for (i)? And what to do when there are no samples within the range distance?
>
> Am I right by assuming that the effect of the range has been taking
> into account by “model” in the krige and gstat and we do not need to
> worry about choosing a value for nmax?
>
I'm not sure. The default is that no neighbourhood selection takes place
unless you specify a neighbourhood. The motivation behind this was
simplicity rather than a model.
--
Edzer
>
> Thanks.
>
> Jin
>
> Thanks for identifying the problem. The problem showed how big the impact of
> the choice of nmax value on the estimation is.
>
> In both krige and gstat, the default value for nmax is Inf. I am doing a
> simulation experiment now to assess the performance of several spatial
> interpolators using a few hundreds datasets. It is easy to do it by using the
> default value. But I am wondering what the best guess for nmax is?
>
>
> I've wondered that too; I know that if n >> 10000 or so, maybe
> earlier, things will fail miserably. But how would you define "best"
> in this context?
>
>
>
>
> I assumed that the default value Inf for nmax would take the maximum value of
> the number of samples available.
>
> Yes, that is the case.
>
> After a few trials, I found my assumption
> was wrong as evidenced below. Obviously, it took a value between 20 and 30.
>
> You falsely assume that CPU times has to increase with nmax. This is
> true as long as it is smaller than the data size, but then it suddenly
> drops. This is because if the search neighbourhood is global, (i) the
> covariance matrix decomposition has to be done only once instead N
> times with N the number of prediction locations , and (ii) the
> overhead for neighbourhood selection vanishes.
>
> --
> Edzer
>
>> system.time(x <- krige(log(zinc)~1, meuse, meuse.grid, model = m))
>>
> [using ordinary kriging]
> user system elapsed
> 0.32 0.00 0.31
>
>> system.time(x <- krige(log(zinc)~1, meuse, meuse.grid, model = m, nmax=20))
>>
> [using ordinary kriging]
> user system elapsed
> 0.25 0.02 0.27
>
>> system.time(x <- krige(log(zinc)~1, meuse, meuse.grid, model = m, nmax=30))
>>
> [using ordinary kriging]
> user system elapsed
> 0.41 0.02 0.43
>
>> system.time(x <- krige(log(zinc)~1, meuse, meuse.grid, model = m, nmax=50))
>>
> [using ordinary kriging]
> user system elapsed
> 0.89 0.00 0.89
>
> Please clarify this.
> Thanks,
> Jin
>
> -----Original Message-----
> From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de]
> Sent: Tuesday, 8 July 2008 5:35
> To: Li Jin
> Cc: r-sig-geo at stat.math.ethz.ch <mailto:r-sig-geo at stat.math.ethz.ch>
> Subject: Re: [R-sig-Geo] why are idw and gstat different? [SEC=UNCLASSIFIED]
>
> Jin, you specified nmax = 7 in the second call, but not in the first.
> Comparing idp values is easiest when other things remain equal.
> --
> Edzer
>
> Jin.Li at ga.gov.au <mailto:Jin.Li at ga.gov.au> wrote:
>
>> Dear there,
>>
>> I tried to compare idw and gstat in library(gstat). I specified idp as 0.5
>>
> in
>
>> both functions, I expected identical results, but what I got are different
>>
> as
>
>> shown for the first five samples. Please help. Thanks.
>>
>>
>>
>>
>>
>>> data(meuse)
>>>
>>>
>>
>>
>>> coordinates(meuse) = ~x+y
>>>
>>>
>>
>>
>>> data(meuse.grid)
>>>
>>>
>>
>>
>>> gridded(meuse.grid) = ~x+y
>>>
>>>
>>
>>
>>> x <- idw(zinc~1, meuse, meuse.grid, idp=0.5)
>>>
>>>
>> [inverse distance weighted interpolation]
>>
>>
>>
>>> as.data.frame(x)[1:5,]
>>>
>>>
>> var1.pred var1.var x y
>>
>> 1 482.8753 NA 181180 333740
>>
>> 2 487.0065 NA 181140 333700
>>
>> 3 483.9747 NA 181180 333700
>>
>> 4 481.2115 NA 181220 333700
>>
>> 5 494.8720 NA 181100 333660
>>
>>
>>
>>
>>
>>> data(meuse)
>>>
>>>
>>
>>
>>> data(meuse.grid)
>>>
>>>
>>
>>
>>> meuse.gstat <- gstat(id = "zinc", formula = zinc ~ 1, locations = ~ x + y,
>>>
>>>
>> + data = meuse, nmax = 7, set = list(idp = .5))
>>
>>
>>
>>> meuse.gstat
>>>
>>>
>> data:
>>
>> zinc : formula = zinc`~`1 ; data dim = 155 x 12 nmax = 7
>>
>> set idp = 0.5;
>>
>> ~x + y
>>
>>
>>
>>> z <- predict(meuse.gstat, meuse.grid)
>>>
>>>
>> [inverse distance weighted interpolation]
>>
>>
>>
>>> z[1:5,]
>>>
>>>
>> x y zinc.pred zinc.var
>>
>> 1 181180 333740 626.3628 NA
>>
>> 2 181140 333700 645.9319 NA
>>
>> 3 181180 333700 629.7041 NA
>>
>> 4 181220 333700 615.1368 NA
>>
>> 5 181100 333660 682.3401 NA
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Jin
>>
>> --------------------------------------------
>>
>> Jin Li, PhD
>>
>> Spatial Modeller/
>>
>> Computational Statistician
>>
>> Marine & Coastal Environment
>>
>> Geoscience Australia
>>
>>
>>
>> Ph: 61 (02) 6249 9899
>>
>> Fax: 61 (02) 6249 9956
>>
>> email: jin.li at ga.gov.au <mailto:jin.li at ga.gov.au> <mailto:jin.li at ga.gov.au>
>>
>> --------------------------------------------
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch <mailto:R-sig-Geo at stat.math.ethz.ch>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>
>
>
>
>
> --
> Edzer Pebesma
> Institute for Geoinformatics (IfGI), University of Münster, Weseler
> Straße 253, 48151 Münster. Phone: +49 251 8333081 Fax: +49 251 8339763
> email: edzer.pebesma at uni-muenster.de <mailto:edzer.pebesma at uni-muenster.de> http://ifgi.uni-muenster.de/
--
Edzer Pebesma
Institute for Geoinformatics (IfGI), University of Münster, Weseler
Straße 253, 48151 Münster. Phone: +49 251 8333081 Fax: +49 251 8339763
email: edzer.pebesma at uni-muenster.de http://ifgi.uni-muenster.de/
More information about the R-sig-Geo
mailing list