[R-sig-Geo] cross validation gstat

Edzer Pebesma edzer.pebesma at uni-muenster.de
Mon Feb 23 20:31:42 CET 2009


ddepew at sciborg.uwaterloo.ca wrote:
> Hi list,
> A quick question regarding n-fold validation...
> I've seen several papers suggest the LOOCV is too optimistic. Is
> n-fold closer to a "true" validation?
I don't think "true" validation exists; could you explain what it is? If
you mean having a completely independent set of observations not
involved in forming the predictions, then there are two issues, (i) how
to form this set from the total set: how to select, how large should it
be? (ii) you're simply forming validation statistics without using all
the information you could use.

In the book by Hastie, Tibshiranie and Friedman (statistical learning)
it is argued (in the context of regression models) that LOOCV often
results in many models that are almost identical, whereas n-fold with
low n results in somewhat more different models. I don't recall they
came with a statistical/theoretical argument why this difference among
models was a good thing.

One of the issues is that with n-fold using random folds (as gstat
does), that the result varies if you repeat the procedure--obviously,
but also a bit of a gamble, then. Which one to pick? Look at
distributions of CV statistics?

I think when you look at CV statistics, you need to question why you do
it; often it is because you want to find out how well the model performs
in a predictive setting. In that case things like predicting locations
very close to measurements is often something that is not possible to CV
at all when data are collected somewhat regular in space.
> I am assuming that it uses the variogram that is constructed using ALL
> data, so my assumption is that the variogram is not re-fit for each
> n-fold before estimation...
>
That is correct. Please submit me code with variogram re-estimation when
you have it. ;-)

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de



More information about the R-sig-Geo mailing list