[R-sig-Geo] Problem with gstat variogram estimation

Carson, John John.Carson at shawgrp.com
Mon Oct 12 18:33:14 CEST 2009


Edzer,

Thank you.  I am now looking at REML since I have small data sets and I really need to use the duplicate data.

In the follow on I will need to accommodate adaptive sampling weights.  The first stage selection is systematic (with some minor modifications).  The second stage is sampling areas adjacent to any primary location that exceeds a threshold for one of the variables of interest.  Another reason, for example, that one might weight would be to reflect the number of (physical) component samples in a composite soil sample.  This may vary from laboratory result to laboratory result.  Is there any way to accommodate this in gstat?

Thanks,
John

John H. Carson Jr., PhD
Senior Statistician
Applied Sciences & Engineering 
Shaw Environmental & Infrastructure
16406 US Rte 224 East
Findlay, OH 45840
Phone 419-425-6156
Fax 419-425-6085
john.carson at shawgrp.com
 
http://www.shawgrp.com/
Shaw(tm) a world of Solutions(tm)
 
 

-----Original Message-----
From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de] 
Sent: Monday, October 12, 2009 10:38 AM
To: Carson, John
Cc: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] Problem with gstat variogram estimation

John, thanks for sharing this with r-sig-geo.

As Thierry mentioned, the default model fitting procedure (fit.variogram
in package gstat) uses weighted least squares, with weights proportional
to N_h/(h^2). This explains why the first lag gets so much weight.

For pure nugget models, this of course makes little sense; for other
models it often does. Argument fit.method gives you somewhat more
control. Give it value 1 to have N_h weights; give it value 6 to do
unweighted averaging (I agree that this information should be in the
fit.variogram documentation). The SSErr values will be uncomparable
accross different weighting schemes, as you might expect.
--
Edzer

Carson, John wrote:
> I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets.  In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms are given below.  
>
> Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes. Afterward, the pure nugget models had the lowest SSErr and were selected.  Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget.  The fitted models are listed below.  Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).  
>
> It seems to me that this must be due to a bug in the GSTAT code.  I pointed this out to Edzer Pebesma, and he asked me to post it here.
>
> The variograms are 
>
> tmp.vgm
> [[1]]
>   np       dist      gamma dir.hor dir.ver  id
> 1  4   1.414214 0.14174537       0       0 PC1
> 2  2  44.742603 6.70989788       0       0 PC1
> 3  2  57.707880 1.76351594       0       0 PC1
> 4  4  59.987678 1.52197310       0       0 PC1
> 5  3  71.512518 1.21348268       0       0 PC1
> 6  1  84.852877 0.05381849       0       0 PC1
> 7  1  97.266495 1.21827622       0       0 PC1
> 8  3 112.237133 5.07947925       0       0 PC1
> 9 18 121.478856 1.93707676       0       0 PC1
>
> [[2]]
>   np       dist      gamma dir.hor dir.ver  id
> 1  4   1.414214 0.09725079       0       0 PC2
> 2  2  44.742603 0.33598072       0       0 PC2
> 3  2  57.707880 0.39088727       0       0 PC2
> 4  4  59.987678 0.87315735       0       0 PC2
> 5  3  71.512518 0.14944845       0       0 PC2
> 6  1  84.852877 0.19809863       0       0 PC2
> 7  1  97.266495 0.63557814       0       0 PC2
> 8  3 112.237133 1.92063948       0       0 PC2
> 9 18 121.478856 0.65468693       0       0 PC2
>
> [[3]]
>   np       dist       gamma dir.hor dir.ver  id
> 1  4   1.414214 0.035250817       0       0 PC3
> 2  2  44.742603 0.105299796       0       0 PC3
> 3  2  57.707880 0.020245674       0       0 PC3
> 4  4  59.987678 0.124159836       0       0 PC3
> 5  3  71.512518 0.008112554       0       0 PC3
> 6  1  84.852877 0.034337591       0       0 PC3
> 7  1  97.266495 0.053879459       0       0 PC3
> 8  3 112.237133 0.021922987       0       0 PC3
> 9 18 121.478856 0.085270969       0       0 PC3
>
>
> But the fitted models are:
>
> tmp.vgm.fit
> [[1]]
>   model     psill range
> 1   Nug 0.1483120     0
>
> [[2]]
>   model      psill range
> 1   Nug 0.09849419     0
>
> [[3]]
>   model      psill range
> 1   Nug 0.03535234     0
>
>
>
>
> John H. Carson Jr., PhD
> Senior Statistician
> Applied Sciences & Engineering 
> Shaw Environmental & Infrastructure
> 16406 US Rte 224 East
> Findlay, OH 45840
> Phone 419-425-6156
> Fax 419-425-6085
> john.carson at shawgrp.com
>
> http://www.shawgrp.com/
> Shaw(tm) a world of Solutions(tm)
>
>  
>
> ****Internet Email Confidentiality Footer****
> Privileged/Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message (or
> responsible for delivery of the message to such person), you may
> not copy or deliver this message to anyone. In such case, you
> should destroy this message and notify the sender by reply email.
> Please advise immediately if you or your employer do not consent to
> Internet email for messages of this kind. Opinions, conclusions and
> other information in this message that do not relate to the
> official business of The Shaw Group Inc. or its subsidiaries shall
> be understood as neither given nor endorsed by it.
> ______________________________________ The Shaw Group Inc.
> http://www.shawgrp.com
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>   

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de



More information about the R-sig-Geo mailing list