[R-sig-Geo] R/gstat UK with indicators

Tomislav Hengl hengl at science.uva.nl
Wed Feb 20 08:19:04 CET 2008


David,

Thanks for you note. It is a very clear example and I have experienced
this problem many times.

If I can be of any assistance, here are few discussion points that might
put some light on the whole thing.

Gstat does not support any stat models different than linear. This is
mainly because gstat implements the so-called "Kriging with External
Drift" algorithm (covariances of residuals extended with auxiliary
predictors). This algorithm is rather elegant and gives then equivalent
results as if you would first estimate linear regression by GLS and then
interpolate and sum the residuals. But the drawback of KED is that it only
works with linear models (multiple linear regression). As far as I know,
KED has been implemented in all software that can run universal kriging
interpolation with auxiliary predictors (maps).

The alternative is to run the so-called "Regression-kriging" approach
where you separately model the deterministic and stochastic part (as you
have also demonstrated; see also sec 2.1 of my lecture notes). However, I
think that there is a complete theory missing (maybe you should report on
this?)! For example, if you apply binomial link function, GAMs or
regression trees, I have a feeling that you should also consider the
spatio-correlation during the estimation of the model parameters. In the
case of multiple regression, covariance matrix is used to account for
spreading (clustering) of the points in the space. In your example, you
fit a GLM that completely ignores location of the points, so I have a
feeling that it is not statistically optimal.

We have also reported recently on interpolation of soil categories
(http://dx.doi.org/10.1016/j.geoderma.2007.04.022). We discovered that the
easiest thing to do is to actually have memberships instead of indicators
(it is also easier to fit variograms if you work with memberships). Then
the system is equivalent to regression-kriging (after logit
transformation), so then you do not have to worry about the link function
(see also section 4.3.3 and Fig. 4.15 of my lecture notes).

All the best,

Tom Hengl
http://spatial-analyst.net

Hengl, T., 2007. A Practical Guide to Geostatistical Mapping of
Environmental Variables. EUR 22904 EN Scientific and Technical Research
series, Office for Official Publications of the European Communities,
Luxemburg, 143 pp.
http://bookshop.europa.eu/uri?target=EUB:NOTICE:LBNA22904:EN:HTML




> I am sure Edzer can answer this, but I post it to the list because it
> may be of general interest.
>
> If I compute an indicator:
>
>> data(meuse)
>> meuse$i.Pb.med <- meuse$lead < median(meuse$lead)
>
> I can of course compute an indicator variogram and krige at new
> locations:
>
>> vi <- variogram(i.Pb.med ~1, meuse)
>> plot(vi, pl=T)
>> (vim <- fit.variogram(vi, vgm(0.17, "Sph", 600, 0.1)))
>   model      psill    range
> 1   Nug 0.08772438   0.0000
> 2   Sph 0.18157082 691.2488
>> plot(vi, pl=T, model=vim)
>> ki <- krige(i.Pb.med ~1, meuse, newdata=meuse.grid, model=vm)
> [using ordinary kriging]
>> summary(ki$var1.pred)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> 0.03628 0.33830 0.64020 0.58080 0.81610 0.95040
>
> I can also fit a logistic regression model to the indicator with a GLM:
>
>> (gm <- glm(i.Pb.med ~ dist, family="binomial", data=meuse at data))
> (Intercept)         dist
>      -2.562       11.662
> Degrees of Freedom: 154 Total (i.e. Null);  153 Residual
> Null Deviance:      214.9   Residual Deviance: 131.2        AIC: 135.2
>
> Here the ~ in the formula is interpreted according to the link function.
>
> I can compute residuals from the GLM, get the residual variogram, and
> model it:
>
>> meuse$glm.r <- residuals(gm)
>> v <- variogram(glm.r ~ 1, meuse, cutoff=1000)
>> plot(v, pl=T)
>> vm <- vgm(0.45, "Exp", 450, 0.6) # eyeball fit
>
> But, what happens when I use the same formula in gstat?
>
> First, just to compute the residual variogram:
>> vr <- variogram(i.Pb.med ~ dist, meuse, cutoff=1000)
> The two residual variograms are very different:
>
>> max(v$gamma)
> [1] 1.001015
>> max(vr$gamma)
> [1] 0.1741679
>
> So clearly the formula (i.Pb.med ~ dist) is being interpreted as a
> linear model, not a logit model.
>
> Second, an attempt at kriging with external drift:
>
>> ked <- krige(i.Pb.med ~ dist, loc=meuse, newdata=meuse.grid, model=vm)
> [using universal kriging]
>
> Since I could not specify a link function, it seems that a linear model
> is being fit (not a logit model).
>
> And we see very strange results for an indicator:
>
>> summary(ked$var1.pred)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> -0.1235  0.2719  0.6550  0.6056  0.9139  1.4990
>
> So (after all that buildup), I just want to know if my interpretation is
> correct, that gstat is not interpreting the formula in krige() or in
> variogram() as a GLM, rather as a linear model, which doesn't make sense
> with indicators.
>
> Thank you,
>
> D G Rossiter
> Senior University Lecturer
> Department of Earth Systems Analysis (DESA)
> International Institute for Geo-Information Science and Earth
> Observation (ITC)
> Enschede, The Netherlands
> Internet: http://www.itc.nl/personal/rossiter
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>




More information about the R-sig-Geo mailing list