[R-sig-Geo] R/gstat UK with indicators

Wed Feb 20 14:10:20 CET 2008

Dear Tom, Roger and Edzer

Here are my 2 cents:
I'm not surprised with the diferences Tom pointed, since this is also my
experience.
Taking residuas from a GLM is rather different from using indicator
variables. Also there may be even some differences depending on which
kind of GLM residuals you take.
Run a GLM and exploring the residuals e.g. via variograms, is something I
consider a routine practice, but it does not aways tell you the whole
story. Fitting a GLGM (generealised linear geostatitical model) can be
more conclusive since you can do infereces on the model parameters
and access the relevance of the spatial term more objectively.
This was the original motivation for geoRglm doing all the modelling at
once and not by two steps such as fitting a model without correlation and
then modelling residuals. This came with the extra burden of calibrating
the MCMC algorithms.
Later spBayes came to scene and indeed looks promissing proposing a more
general framework whereas geoRglm is rather specific to univariate
binomial and poison models.

As Roger says there is scope to play around with other alternatives like
the GLMM or maybe MCMCpack, but this is certainly not ready
"out-of-the-box" and code will need to be adapted for spatial purposes.

regards to all

Paulo Justiniano Ribeiro Jr
LEG (Laboratorio de Estatistica e Geoinformacao)
Universidade Federal do Parana
Caixa Postal 19.081
CEP 81.531-990
Curitiba, PR  -  Brasil
Tel: (+55) 41 3361 3573
Fax: (+55) 41 3361 3141
e-mail: paulojus AT  ufpr  br
http://www.leg.ufpr.br/~paulojus

-------------------------------------------------------------------------
53a Reuniao Anual da Regiao Brasileira da Soc. Internacional de Biometria
14 a 16/05/2008, UFLA, Lavras,MG
http://www.rbras.org.br/rbras53
-------------------------------------------------------------------------

On Wed, 20 Feb 2008, Roger Bivand wrote:

> On Wed, 20 Feb 2008, Tomislav Hengl wrote:
>
> >
> > David,
> >
> > Thanks for you note. It is a very clear example and I have experienced
> > this problem many times.
> >
> > If I can be of any assistance, here are few discussion points that might
> > put some light on the whole thing.
> >
> > Gstat does not support any stat models different than linear. This is
> > mainly because gstat implements the so-called "Kriging with External
> > Drift" algorithm (covariances of residuals extended with auxiliary
> > predictors). This algorithm is rather elegant and gives then equivalent
> > results as if you would first estimate linear regression by GLS and then
> > interpolate and sum the residuals. But the drawback of KED is that it only
> > works with linear models (multiple linear regression). As far as I know,
> > KED has been implemented in all software that can run universal kriging
> > interpolation with auxiliary predictors (maps).
> >
> > The alternative is to run the so-called "Regression-kriging" approach
> > where you separately model the deterministic and stochastic part (as you
> > have also demonstrated; see also sec 2.1 of my lecture notes). However, I
> > think that there is a complete theory missing (maybe you should report on
> > this?)! For example, if you apply binomial link function, GAMs or
> > regression trees, I have a feeling that you should also consider the
> > spatio-correlation during the estimation of the model parameters. In the
> > case of multiple regression, covariance matrix is used to account for
> > spreading (clustering) of the points in the space. In your example, you
> > fit a GLM that completely ignores location of the points, so I have a
> > feeling that it is not statistically optimal.
> >
> > We have also reported recently on interpolation of soil categories
> > (http://dx.doi.org/10.1016/j.geoderma.2007.04.022). We discovered that the
> > easiest thing to do is to actually have memberships instead of indicators
> > (it is also easier to fit variograms if you work with memberships). Then
> > the system is equivalent to regression-kriging (after logit
> > transformation), so then you do not have to worry about the link function
> > (see also section 4.3.3 and Fig. 4.15 of my lecture notes).
>
> Would it be possible to try several of the alternatives and see where they
> go? One possibility is the use of GLMM (the MASS variant glmmPQL()) as
> reported in Dormann et al. 2007 with tricks to include a spatial
> correlation term? Reference in:
>
> https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003084.html
>
> Is another to follow up Paulo Ribeiro's reply to a geoRglm question
> yesterday about using the trend= argument in setting up the controls for
> Binomial or Poisson kriging?
>
> Further, I don't think that either spBayes or ramps get you there yet, but
> I would be surprised if at least some of the infrastructure isn't included
> already. Section 5.2 in Banerjee et al. is suggestive, and spBayes is by
> the book's authors.
>
> As Tom says, this is all fairly speculative and fresh, but at least offers
> something for small data sets and people with lots of patience! The
> spBayes JSS paper suggests that their engine performs better than geoRglm,
> but I guess there are horses for courses. The JSS link:
>
> http://www.jstatsoft.org/v19/i04
>
> Hope this helps,
>
> Roger
>
> >
> > All the best,
> >
> > Tom Hengl
> > http://spatial-analyst.net
> >
> > Hengl, T., 2007. A Practical Guide to Geostatistical Mapping of
> > Environmental Variables. EUR 22904 EN Scientific and Technical Research
> > series, Office for Official Publications of the European Communities,
> > Luxemburg, 143 pp.
> > http://bookshop.europa.eu/uri?target=EUB:NOTICE:LBNA22904:EN:HTML
> >
> >
> >
> >
> >> I am sure Edzer can answer this, but I post it to the list because it
> >> may be of general interest.
> >>
> >> If I compute an indicator:
> >>
> >>> data(meuse)
> >>> meuse$i.Pb.med <- meuse$lead < median(meuse$lead)
> >>
> >> I can of course compute an indicator variogram and krige at new
> >> locations:
> >>
> >>> vi <- variogram(i.Pb.med ~1, meuse)
> >>> plot(vi, pl=T)
> >>> (vim <- fit.variogram(vi, vgm(0.17, "Sph", 600, 0.1)))
> >>   model      psill    range
> >> 1   Nug 0.08772438   0.0000
> >> 2   Sph 0.18157082 691.2488
> >>> plot(vi, pl=T, model=vim)
> >>> ki <- krige(i.Pb.med ~1, meuse, newdata=meuse.grid, model=vm)
> >> [using ordinary kriging]
> >>> summary(ki$var1.pred)
> >>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >> 0.03628 0.33830 0.64020 0.58080 0.81610 0.95040
> >>
> >> I can also fit a logistic regression model to the indicator with a GLM:
> >>
> >>> (gm <- glm(i.Pb.med ~ dist, family="binomial", data=meuse at data))
> >> (Intercept)         dist
> >>      -2.562       11.662
> >> Degrees of Freedom: 154 Total (i.e. Null);  153 Residual
> >> Null Deviance:      214.9   Residual Deviance: 131.2        AIC: 135.2
> >>
> >> Here the ~ in the formula is interpreted according to the link function.
> >>
> >> I can compute residuals from the GLM, get the residual variogram, and
> >> model it:
> >>
> >>> meuse$glm.r <- residuals(gm)
> >>> v <- variogram(glm.r ~ 1, meuse, cutoff=1000)
> >>> plot(v, pl=T)
> >>> vm <- vgm(0.45, "Exp", 450, 0.6) # eyeball fit
> >>
> >> But, what happens when I use the same formula in gstat?
> >>
> >> First, just to compute the residual variogram:
> >>> vr <- variogram(i.Pb.med ~ dist, meuse, cutoff=1000)
> >> The two residual variograms are very different:
> >>
> >>> max(v$gamma)
> >> [1] 1.001015
> >>> max(vr$gamma)
> >> [1] 0.1741679
> >>
> >> So clearly the formula (i.Pb.med ~ dist) is being interpreted as a
> >> linear model, not a logit model.
> >>
> >> Second, an attempt at kriging with external drift:
> >>
> >>> ked <- krige(i.Pb.med ~ dist, loc=meuse, newdata=meuse.grid, model=vm)
> >> [using universal kriging]
> >>
> >> Since I could not specify a link function, it seems that a linear model
> >> is being fit (not a logit model).
> >>
> >> And we see very strange results for an indicator:
> >>
> >>> summary(ked$var1.pred)
> >>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >> -0.1235  0.2719  0.6550  0.6056  0.9139  1.4990
> >>
> >> So (after all that buildup), I just want to know if my interpretation is
> >> correct, that gstat is not interpreting the formula in krige() or in
> >> variogram() as a GLM, rather as a linear model, which doesn't make sense
> >> with indicators.
> >>
> >> Thank you,
> >>
> >> D G Rossiter
> >> Senior University Lecturer
> >> Department of Earth Systems Analysis (DESA)
> >> International Institute for Geo-Information Science and Earth
> >> Observation (ITC)
> >> Enschede, The Netherlands
> >> Internet: http://www.itc.nl/personal/rossiter
> >>
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> R-sig-Geo at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>