[R-sig-Geo] Problem with gstat variogram estimation

Paul Hiemstra p.hiemstra at geo.uu.nl
Mon Oct 12 16:41:15 CEST 2009


Edzer Pebesma wrote:
> John, thanks for sharing this with r-sig-geo.
>
> As Thierry mentioned, the default model fitting procedure (fit.variogram
> in package gstat) uses weighted least squares, with weights proportional
> to N_h/(h^2). This explains why the first lag gets so much weight.
>
> For pure nugget models, this of course makes little sense; for other
> models it often does. Argument fit.method gives you somewhat more
> control. Give it value 1 to have N_h weights; give it value 6 to do
> unweighted averaging (I agree that this information should be in the
>   
It is listed in the gstat documentation:

http://www.gstat.org/gstat.pdf

on page 42, in the middle.

cheers,
Paul
> fit.variogram documentation). The SSErr values will be uncomparable
> accross different weighting schemes, as you might expect.
> --
> Edzer
>
> Carson, John wrote:
>   
>> I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets.  In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms are given below.  
>>
>> Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes. Afterward, the pure nugget models had the lowest SSErr and were selected.  Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget.  The fitted models are listed below.  Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).  
>>
>> It seems to me that this must be due to a bug in the GSTAT code.  I pointed this out to Edzer Pebesma, and he asked me to post it here.
>>
>> The variograms are 
>>
>> tmp.vgm
>> [[1]]
>>   np       dist      gamma dir.hor dir.ver  id
>> 1  4   1.414214 0.14174537       0       0 PC1
>> 2  2  44.742603 6.70989788       0       0 PC1
>> 3  2  57.707880 1.76351594       0       0 PC1
>> 4  4  59.987678 1.52197310       0       0 PC1
>> 5  3  71.512518 1.21348268       0       0 PC1
>> 6  1  84.852877 0.05381849       0       0 PC1
>> 7  1  97.266495 1.21827622       0       0 PC1
>> 8  3 112.237133 5.07947925       0       0 PC1
>> 9 18 121.478856 1.93707676       0       0 PC1
>>
>> [[2]]
>>   np       dist      gamma dir.hor dir.ver  id
>> 1  4   1.414214 0.09725079       0       0 PC2
>> 2  2  44.742603 0.33598072       0       0 PC2
>> 3  2  57.707880 0.39088727       0       0 PC2
>> 4  4  59.987678 0.87315735       0       0 PC2
>> 5  3  71.512518 0.14944845       0       0 PC2
>> 6  1  84.852877 0.19809863       0       0 PC2
>> 7  1  97.266495 0.63557814       0       0 PC2
>> 8  3 112.237133 1.92063948       0       0 PC2
>> 9 18 121.478856 0.65468693       0       0 PC2
>>
>> [[3]]
>>   np       dist       gamma dir.hor dir.ver  id
>> 1  4   1.414214 0.035250817       0       0 PC3
>> 2  2  44.742603 0.105299796       0       0 PC3
>> 3  2  57.707880 0.020245674       0       0 PC3
>> 4  4  59.987678 0.124159836       0       0 PC3
>> 5  3  71.512518 0.008112554       0       0 PC3
>> 6  1  84.852877 0.034337591       0       0 PC3
>> 7  1  97.266495 0.053879459       0       0 PC3
>> 8  3 112.237133 0.021922987       0       0 PC3
>> 9 18 121.478856 0.085270969       0       0 PC3
>>
>>
>> But the fitted models are:
>>
>> tmp.vgm.fit
>> [[1]]
>>   model     psill range
>> 1   Nug 0.1483120     0
>>
>> [[2]]
>>   model      psill range
>> 1   Nug 0.09849419     0
>>
>> [[3]]
>>   model      psill range
>> 1   Nug 0.03535234     0
>>
>>
>>
>>
>> John H. Carson Jr., PhD
>> Senior Statistician
>> Applied Sciences & Engineering 
>> Shaw Environmental & Infrastructure
>> 16406 US Rte 224 East
>> Findlay, OH 45840
>> Phone 419-425-6156
>> Fax 419-425-6085
>> john.carson at shawgrp.com
>>
>> http://www.shawgrp.com/
>> Shaw(tm) a world of Solutions(tm)
>>
>>  
>>
>> ****Internet Email Confidentiality Footer****
>> Privileged/Confidential Information may be contained in this
>> message. If you are not the addressee indicated in this message (or
>> responsible for delivery of the message to such person), you may
>> not copy or deliver this message to anyone. In such case, you
>> should destroy this message and notify the sender by reply email.
>> Please advise immediately if you or your employer do not consent to
>> Internet email for messages of this kind. Opinions, conclusions and
>> other information in this message that do not relate to the
>> official business of The Shaw Group Inc. or its subsidiaries shall
>> be understood as neither given nor endorsed by it.
>> ______________________________________ The Shaw Group Inc.
>> http://www.shawgrp.com
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>   
>>     
>
>   


-- 
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul



More information about the R-sig-Geo mailing list