[R-sig-Geo] R/gstat UK with indicators

Tue Feb 19 14:39:07 CET 2008

I am sure Edzer can answer this, but I post it to the list because it
may be of general interest.

If I compute an indicator:

> data(meuse)
> meuse$i.Pb.med <- meuse$lead < median(meuse$lead)

I can of course compute an indicator variogram and krige at new
locations:

> vi <- variogram(i.Pb.med ~1, meuse)
> plot(vi, pl=T)
> (vim <- fit.variogram(vi, vgm(0.17, "Sph", 600, 0.1)))
  model      psill    range
1   Nug 0.08772438   0.0000
2   Sph 0.18157082 691.2488
> plot(vi, pl=T, model=vim)
> ki <- krige(i.Pb.med ~1, meuse, newdata=meuse.grid, model=vm)
[using ordinary kriging]
> summary(ki$var1.pred)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.03628 0.33830 0.64020 0.58080 0.81610 0.95040 

I can also fit a logistic regression model to the indicator with a GLM:

> (gm <- glm(i.Pb.med ~ dist, family="binomial", data=meuse at data))
(Intercept)         dist  
     -2.562       11.662  
Degrees of Freedom: 154 Total (i.e. Null);  153 Residual
Null Deviance:      214.9   Residual Deviance: 131.2        AIC: 135.2 

Here the ~ in the formula is interpreted according to the link function.

I can compute residuals from the GLM, get the residual variogram, and
model it:

> meuse$glm.r <- residuals(gm)
> v <- variogram(glm.r ~ 1, meuse, cutoff=1000)	
> plot(v, pl=T)
> vm <- vgm(0.45, "Exp", 450, 0.6) # eyeball fit

But, what happens when I use the same formula in gstat?

First, just to compute the residual variogram:
> vr <- variogram(i.Pb.med ~ dist, meuse, cutoff=1000)
The two residual variograms are very different:

> max(v$gamma)
[1] 1.001015
> max(vr$gamma)
[1] 0.1741679

So clearly the formula (i.Pb.med ~ dist) is being interpreted as a
linear model, not a logit model.

Second, an attempt at kriging with external drift:

> ked <- krige(i.Pb.med ~ dist, loc=meuse, newdata=meuse.grid, model=vm)
[using universal kriging]

Since I could not specify a link function, it seems that a linear model
is being fit (not a logit model).

And we see very strange results for an indicator:

> summary(ked$var1.pred)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1235  0.2719  0.6550  0.6056  0.9139  1.4990 

So (after all that buildup), I just want to know if my interpretation is
correct, that gstat is not interpreting the formula in krige() or in
variogram() as a GLM, rather as a linear model, which doesn't make sense
with indicators.

Thank you,

D G Rossiter
Senior University Lecturer
Department of Earth Systems Analysis (DESA)
International Institute for Geo-Information Science and Earth
Observation (ITC)
Enschede, The Netherlands
Internet: http://www.itc.nl/personal/rossiter