[R-sig-Geo] Problem with gstat variogram estimation

Edzer Pebesma edzer.pebesma at uni-muenster.de
Mon Oct 12 16:38:04 CEST 2009


John, thanks for sharing this with r-sig-geo.

As Thierry mentioned, the default model fitting procedure (fit.variogram
in package gstat) uses weighted least squares, with weights proportional
to N_h/(h^2). This explains why the first lag gets so much weight.

For pure nugget models, this of course makes little sense; for other
models it often does. Argument fit.method gives you somewhat more
control. Give it value 1 to have N_h weights; give it value 6 to do
unweighted averaging (I agree that this information should be in the
fit.variogram documentation). The SSErr values will be uncomparable
accross different weighting schemes, as you might expect.
--
Edzer

Carson, John wrote:
> I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets.  In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms are given below.  
>
> Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes. Afterward, the pure nugget models had the lowest SSErr and were selected.  Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget.  The fitted models are listed below.  Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).  
>
> It seems to me that this must be due to a bug in the GSTAT code.  I pointed this out to Edzer Pebesma, and he asked me to post it here.
>
> The variograms are 
>
> tmp.vgm
> [[1]]
>   np       dist      gamma dir.hor dir.ver  id
> 1  4   1.414214 0.14174537       0       0 PC1
> 2  2  44.742603 6.70989788       0       0 PC1
> 3  2  57.707880 1.76351594       0       0 PC1
> 4  4  59.987678 1.52197310       0       0 PC1
> 5  3  71.512518 1.21348268       0       0 PC1
> 6  1  84.852877 0.05381849       0       0 PC1
> 7  1  97.266495 1.21827622       0       0 PC1
> 8  3 112.237133 5.07947925       0       0 PC1
> 9 18 121.478856 1.93707676       0       0 PC1
>
> [[2]]
>   np       dist      gamma dir.hor dir.ver  id
> 1  4   1.414214 0.09725079       0       0 PC2
> 2  2  44.742603 0.33598072       0       0 PC2
> 3  2  57.707880 0.39088727       0       0 PC2
> 4  4  59.987678 0.87315735       0       0 PC2
> 5  3  71.512518 0.14944845       0       0 PC2
> 6  1  84.852877 0.19809863       0       0 PC2
> 7  1  97.266495 0.63557814       0       0 PC2
> 8  3 112.237133 1.92063948       0       0 PC2
> 9 18 121.478856 0.65468693       0       0 PC2
>
> [[3]]
>   np       dist       gamma dir.hor dir.ver  id
> 1  4   1.414214 0.035250817       0       0 PC3
> 2  2  44.742603 0.105299796       0       0 PC3
> 3  2  57.707880 0.020245674       0       0 PC3
> 4  4  59.987678 0.124159836       0       0 PC3
> 5  3  71.512518 0.008112554       0       0 PC3
> 6  1  84.852877 0.034337591       0       0 PC3
> 7  1  97.266495 0.053879459       0       0 PC3
> 8  3 112.237133 0.021922987       0       0 PC3
> 9 18 121.478856 0.085270969       0       0 PC3
>
>
> But the fitted models are:
>
> tmp.vgm.fit
> [[1]]
>   model     psill range
> 1   Nug 0.1483120     0
>
> [[2]]
>   model      psill range
> 1   Nug 0.09849419     0
>
> [[3]]
>   model      psill range
> 1   Nug 0.03535234     0
>
>
>
>
> John H. Carson Jr., PhD
> Senior Statistician
> Applied Sciences & Engineering 
> Shaw Environmental & Infrastructure
> 16406 US Rte 224 East
> Findlay, OH 45840
> Phone 419-425-6156
> Fax 419-425-6085
> john.carson at shawgrp.com
>
> http://www.shawgrp.com/
> Shaw(tm) a world of Solutions(tm)
>
>  
>
> ****Internet Email Confidentiality Footer****
> Privileged/Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message (or
> responsible for delivery of the message to such person), you may
> not copy or deliver this message to anyone. In such case, you
> should destroy this message and notify the sender by reply email.
> Please advise immediately if you or your employer do not consent to
> Internet email for messages of this kind. Opinions, conclusions and
> other information in this message that do not relate to the
> official business of The Shaw Group Inc. or its subsidiaries shall
> be understood as neither given nor endorsed by it.
> ______________________________________ The Shaw Group Inc.
> http://www.shawgrp.com
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>   

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de



More information about the R-sig-Geo mailing list