[R-sig-Geo] Fitting nested variograms to empirical variograms

Sat Oct 16 14:14:42 CEST 2004

Dear Edzer

Thanks for your comment. I don't think we strongly disagree on these 
matters. I hope this response clarifies  my current view point.

* I certainly don't want to debunk the empirical variogram. I find it 
very useful as an exploratory tool. For example, the emperical variogram 
might reveal pseudo-periodicity in the data and it might reveal 
directional effects. For some projects there is also the questions 
whether there is actually any spatial structure in the data, which a 
variogram plot of residuals [or standardised residuals if you having a 
GLM model] would reveal. Also plotting the empirical variogram might 
reveal if something has gone wrong when fitting by m.l.e.
My recommandation : "Always plot the empirical variogram [of 
standardised residuals]"

* I agree that the micro-scale variation component may be an important 
component. Since the data does not contain any information about whether 
a non-spatial component is part of the signal of interest or just random 
noise then the user has to specify this himself. This is an issue no 
matter what inference-machinery you are using [m.l.e. or fitting to 
variograms].
I can't see we disagree about anything here [and if you see my paper in 
the september 2004 issue of Journal of computational and Graphical 
Statistics, then there is a discussion about micro-scale issues for 
likelihood inference in a spatial Poisson model].

* Nested variogram models. My objection to them is based on what I have 
sometimes seen : a very elaborate fitting to empirical variograms, where 
a lot of effort is going into fitting the variogram away from the 
origin, and where the number of variogram models used in the nested 
structure seems to decided by this fitting to the empirical variogram in 
mind.
A nested model for the variogram really says that the phenomenon we are 
modelling is Y(x) = Y_1(x) + Y_2(x) + Y_3(x) + Y_4(x) etc. , where the 
different components have different spatial structure.
Rather than letting the empirical variogram decide the number of 
components, then shouldn't we start thinking about at the data 
generating mechanisms instead ?
When having more than one spatial component Y_i(x), shouldn't we attempt 
interpreting the different components ?
How about the implicit additivity assumption of the components when 
using a nested model ? [The data generating mechanism may suggest 
otherwise ... ].
A blind use of nested variogram models seems silly to me.

* Fitting a nested variogram model. In case you want to use such a 
model, then you may fit the parameters by maximum likelihood, which was 
one point I tried to make in my previous mail. I see now that I may have 
stressed that point a bit too hard.
I expect that a procedure for finding the maximum of the likelihood, for 
some data sets might have convergence problems due to identifiability 
problems of parameters. So probably good starting values are needed, but 
from your previous e-mail I see that there seems to be a similar issue 
for fitting to variograms. As you wrote in your previous e-mail, good 
starting values can be found by fitting a nested model by eye. I also 
have to admit, that currently there seems to be no procedure available 
in packages in R for fitting nested variogram models using maximum 
likelihood [so we are lacking behind in that respect].

* Using the likelihood function : A certain type of books and papers 
about geostatistics may have emphasised the likelihood function too 
strongly.
 Being brought up as a statistician, then using the likelihood function 
for inference is the natural thing to me. But I have also been taught to 
be be careful about the model.
A model should catch the important structure of the data [here you need 
input from subject matter people]. Considering and investigating the 
structure of a model in many aspect is where we should spend our time.
 I give my applaud to the final sentence in your e-mail ``Geostatistics 
is about modelling what's out there."

* Last comment : Your suggested comparison  (ML without nested vs. 
nested models, traditionally fit) is missing the point entirely, since 
such a comparison would be a comparison of two different models, rather 
than two procedures for inference.

Best regards
Ole

Edzer J. Pebesma wrote:

>
>
> Ole F. Christensen wrote:
>
>>
>> I have no experience fitting nested variogram models myself, but my 
>> general opinion is that nested variograms aren't really useful, since 
>> what matters the most is
>> to make a good fit of the empirical variogram near the origin. And if 
>> one really wants to make a very careful fit of a variogram-model to 
>> the data, then the likelihood function should be used rather than 
>> fitting to the empirical variogram.
>
>
> This reasoning has been put forward in the 1999 book by Michael Stein 
> which
> contains besides this one a few very provocative statements, such as 
> "forget about
> sample variograms, only look at likelyhood profiles". Although I like 
> the book,
> the problem I have with it is that it contains hardly any analysis of 
> real data. The
> argument therefore is based on theory; mathematicians do that, and 
> they may prove
> right.
>
> However, nested variograms have been very useful in the past, 
> especially for
> describing spatial variability in larger data sets. There are 
> theoretical arguments
> for using them, think e.g. of the nugget effect: it consists of 
> measurement error
> (a "true" nugget effect) and spatially correlated microvariation: a 
> nested variogram
> model with a range so small that it's usually not detected by the 
> data; see
> Cressie (1993) for more on this. Given it's not in the data, ML or 
> REML will never pick
> it up, it's only something you can (and should) impose when you know for
> instance the true measurement error from other sources than the 
> observed data.
>
> I would like to see papers where both approaches (ML without nested vs.
> nested models, traditionally fit) were compared with large data sets; 
> I find
> it hard to embrace theoretical ideas without having them seen work in 
> practice.
>
> Geostatistics is about modelling what's out there.
> -- 
> Edzer
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>

-- 
Ole F. Christensen
BiRC - Bioinformatics Research Center
University of Aarhus