[R-sig-Geo] Distance binning within variograms

Wed Sep 10 13:23:50 CEST 2014

Hi there!

First thank you very much for answering my last question in such a 
depth. As suggested, i am proceeding with variograms, and i have now to 
decide how to bin my data.

First, my data is organised like this:

http://s1.postimg.org/bkjgdavvj/image.png

10m x 10m, subdivided in 30 plots, which are sampled twice in close 
proximity, making it 60 samples per grid.
The distances of my 60*59/2 point pairs within the grid are like this:

summary(dist(aug.dis))
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   0.500   3.307   5.035   5.126   6.872  11.920

Here is the histogram, classified by 50cm lags:

http://s28.postimg.org/49iqyz8jh/Hist_aug.jpg

For binning the distances, I have read that "As a guide, Isaacs and 
Srivastava* suggest that if the samples are located on a pseudoregular 
grid, the grid spacing is usually a good lag size. If the sampling is 
random (as in this case), the average distance between neighboring 
samples can be used as an initial lag size." 
(http://webhelp.esri.com/arcgisdesktop/9.3/tutorials/geostat/Geostat_3_2.htm).

Now, if i go with the basic variogram in sdpep with everything on default:

variogram(mydata[,1]~1, mydata]

  np     dist       gamma dir.hor dir.ver   id
1   31 0.500000 0.001113978       0       0 var1
2   10 0.769093 0.000617368       0       0 var1
3   18 1.128502 0.001945660       0       0 var1
4   27 1.281212 0.002560873       0       0 var1
5   50 1.601944 0.002974914       0       0 var1
6   51 1.940722 0.001848663       0       0 var1
7   54 2.236781 0.002047786       0       0 var1
8   63 2.540425 0.002047856       0       0 var1
9   54 2.830686 0.002353565       0       0 var1
10  78 3.088837 0.002293626       0       0 var1
11 101 3.425051 0.002118441       0       0 var1
12  60 3.708046 0.002641097       0       0 var1
13  83 4.008823 0.001944181       0       0 var1
14  38 4.266185 0.002779340       0       0 var1

I am getting 14 bins, being separated my about 20-35 cms, so it seems 
that the absolute lag size was not used, correct? So, i wonder how above 
statement fits into the picture here.

For the actual modell fitting i am using "autofitVariogram" of the 
package automap, and hereby i may choose the binning for myself, but by 
the number of minimum points per bin.
Thing is, i dont get the same binning size as with sdpep, raising the 
question, what actually makes sense for my data.
Example:
autofitVariogram(mydata[,1]~1, mydata, model=c("Exp", "Sph"),
                          GLS.model=TRUE,
                          miscFitOptions=list(merge.small.bins=TRUE, 
min.np.bin=1),
                          cutoff=8, width=1,
                          verbose=TRUE)

Even with the non-conservative setting min.np.bin ( ~ minimum necessary 
points to form a bin) of 1, i am getting:

np      dist        gamma dir.hor dir.ver   id
1  31 0.5000000 0.0010506776       0       0 var1
2   2 0.6800000 0.0001693078       0       0 var1
3  19 0.9544668 0.0005623298       0       0 var1
4  44 1.3611090 0.0012611147       0       0 var1
5 133 1.9898870 0.0010649182       0       0 var1
6 134 2.7270631 0.0011212414       0       0 var1
7 153 3.4178791 0.0008814734       0       0 var1
8 240 4.2357844 0.0011453654       0       0 var1

Which has much less resolution as the sdpep binning, especially at 
greater distances. If i am going too high for the nin.np.bin, i even 
miss the second distance  (in both cases a severe drop in variance for 
the second row), making this question
even more crucial. I really dont want to mess up, so please if you have 
any advise, let me know. Here are example variograms from the data above.

*with automap, min.np.bin**=10*
http://s10.postimg.org/6kxj37i6x/Bin_10.jpg

*with automap, min.np.bin=20*
http://s10.postimg.org/f1x1e4mvt/bin20.jpg

*with spdep, plot(variogram)**, default options*
http://s10.postimg.org/gsg2fm4ex/sdpep.jpg

You see, different conclusions could be drawn (in the 2nd picture, a 
trend to a spatial model could be observed).

I also wondering: The way to judge if my data follows a non-random 
spatial process is visually. Is there any numerical parameter of the 
variogram that back ups my visual judgment, like with Moran or Geary?

Thank you very much!

Tim

	[[alternative HTML version deleted]]