[R-sig-Geo] R/gstat UK with indicators

Roger Bivand Roger.Bivand at nhh.no
Wed Feb 20 08:55:51 CET 2008


On Wed, 20 Feb 2008, Tomislav Hengl wrote:

>
> David,
>
> Thanks for you note. It is a very clear example and I have experienced
> this problem many times.
>
> If I can be of any assistance, here are few discussion points that might
> put some light on the whole thing.
>
> Gstat does not support any stat models different than linear. This is
> mainly because gstat implements the so-called "Kriging with External
> Drift" algorithm (covariances of residuals extended with auxiliary
> predictors). This algorithm is rather elegant and gives then equivalent
> results as if you would first estimate linear regression by GLS and then
> interpolate and sum the residuals. But the drawback of KED is that it only
> works with linear models (multiple linear regression). As far as I know,
> KED has been implemented in all software that can run universal kriging
> interpolation with auxiliary predictors (maps).
>
> The alternative is to run the so-called "Regression-kriging" approach
> where you separately model the deterministic and stochastic part (as you
> have also demonstrated; see also sec 2.1 of my lecture notes). However, I
> think that there is a complete theory missing (maybe you should report on
> this?)! For example, if you apply binomial link function, GAMs or
> regression trees, I have a feeling that you should also consider the
> spatio-correlation during the estimation of the model parameters. In the
> case of multiple regression, covariance matrix is used to account for
> spreading (clustering) of the points in the space. In your example, you
> fit a GLM that completely ignores location of the points, so I have a
> feeling that it is not statistically optimal.
>
> We have also reported recently on interpolation of soil categories
> (http://dx.doi.org/10.1016/j.geoderma.2007.04.022). We discovered that the
> easiest thing to do is to actually have memberships instead of indicators
> (it is also easier to fit variograms if you work with memberships). Then
> the system is equivalent to regression-kriging (after logit
> transformation), so then you do not have to worry about the link function
> (see also section 4.3.3 and Fig. 4.15 of my lecture notes).

Would it be possible to try several of the alternatives and see where they 
go? One possibility is the use of GLMM (the MASS variant glmmPQL()) as 
reported in Dormann et al. 2007 with tricks to include a spatial 
correlation term? Reference in:

https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003084.html

Is another to follow up Paulo Ribeiro's reply to a geoRglm question 
yesterday about using the trend= argument in setting up the controls for 
Binomial or Poisson kriging?

Further, I don't think that either spBayes or ramps get you there yet, but 
I would be surprised if at least some of the infrastructure isn't included 
already. Section 5.2 in Banerjee et al. is suggestive, and spBayes is by 
the book's authors.

As Tom says, this is all fairly speculative and fresh, but at least offers 
something for small data sets and people with lots of patience! The 
spBayes JSS paper suggests that their engine performs better than geoRglm, 
but I guess there are horses for courses. The JSS link:

http://www.jstatsoft.org/v19/i04

Hope this helps,

Roger

>
> All the best,
>
> Tom Hengl
> http://spatial-analyst.net
>
> Hengl, T., 2007. A Practical Guide to Geostatistical Mapping of
> Environmental Variables. EUR 22904 EN Scientific and Technical Research
> series, Office for Official Publications of the European Communities,
> Luxemburg, 143 pp.
> http://bookshop.europa.eu/uri?target=EUB:NOTICE:LBNA22904:EN:HTML
>
>
>
>
>> I am sure Edzer can answer this, but I post it to the list because it
>> may be of general interest.
>>
>> If I compute an indicator:
>>
>>> data(meuse)
>>> meuse$i.Pb.med <- meuse$lead < median(meuse$lead)
>>
>> I can of course compute an indicator variogram and krige at new
>> locations:
>>
>>> vi <- variogram(i.Pb.med ~1, meuse)
>>> plot(vi, pl=T)
>>> (vim <- fit.variogram(vi, vgm(0.17, "Sph", 600, 0.1)))
>>   model      psill    range
>> 1   Nug 0.08772438   0.0000
>> 2   Sph 0.18157082 691.2488
>>> plot(vi, pl=T, model=vim)
>>> ki <- krige(i.Pb.med ~1, meuse, newdata=meuse.grid, model=vm)
>> [using ordinary kriging]
>>> summary(ki$var1.pred)
>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> 0.03628 0.33830 0.64020 0.58080 0.81610 0.95040
>>
>> I can also fit a logistic regression model to the indicator with a GLM:
>>
>>> (gm <- glm(i.Pb.med ~ dist, family="binomial", data=meuse at data))
>> (Intercept)         dist
>>      -2.562       11.662
>> Degrees of Freedom: 154 Total (i.e. Null);  153 Residual
>> Null Deviance:      214.9   Residual Deviance: 131.2        AIC: 135.2
>>
>> Here the ~ in the formula is interpreted according to the link function.
>>
>> I can compute residuals from the GLM, get the residual variogram, and
>> model it:
>>
>>> meuse$glm.r <- residuals(gm)
>>> v <- variogram(glm.r ~ 1, meuse, cutoff=1000)
>>> plot(v, pl=T)
>>> vm <- vgm(0.45, "Exp", 450, 0.6) # eyeball fit
>>
>> But, what happens when I use the same formula in gstat?
>>
>> First, just to compute the residual variogram:
>>> vr <- variogram(i.Pb.med ~ dist, meuse, cutoff=1000)
>> The two residual variograms are very different:
>>
>>> max(v$gamma)
>> [1] 1.001015
>>> max(vr$gamma)
>> [1] 0.1741679
>>
>> So clearly the formula (i.Pb.med ~ dist) is being interpreted as a
>> linear model, not a logit model.
>>
>> Second, an attempt at kriging with external drift:
>>
>>> ked <- krige(i.Pb.med ~ dist, loc=meuse, newdata=meuse.grid, model=vm)
>> [using universal kriging]
>>
>> Since I could not specify a link function, it seems that a linear model
>> is being fit (not a logit model).
>>
>> And we see very strange results for an indicator:
>>
>>> summary(ked$var1.pred)
>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> -0.1235  0.2719  0.6550  0.6056  0.9139  1.4990
>>
>> So (after all that buildup), I just want to know if my interpretation is
>> correct, that gstat is not interpreting the formula in krige() or in
>> variogram() as a GLM, rather as a linear model, which doesn't make sense
>> with indicators.
>>
>> Thank you,
>>
>> D G Rossiter
>> Senior University Lecturer
>> Department of Earth Systems Analysis (DESA)
>> International Institute for Geo-Information Science and Earth
>> Observation (ITC)
>> Enschede, The Netherlands
>> Internet: http://www.itc.nl/personal/rossiter
>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list