[R-sig-Geo] Beginner's question on choosing the correct test, part.2

Fri Aug 22 15:25:03 CEST 2014

Excuse my beginner level-questions.
I am stuck at the beginning, i.e. choosing the correct neighbor lists. I 
tried knearneigh (with several "k"s), tri2nb, and dnearneigh. Yesterday, 
i was nicely shown that in some datasets the nblist creation process is 
not always impactful, but it seems
for my data, it does matter.

My original plot consists of 59 points semi-regularily distributed on a 
10x10m area. Points are allocated, that some points have two absolute 
nearest neighbours, others just one.
The min distance is 50 cm, the highest is about 12m.
My question is if bacterial communities sampled at these plots follow 
non-random spatial processes. I will repeat that for every abundant 
species in each population.

Here are nb plots (from top to bottom):

http://s2.postimg.org/tkxqw82ih/image.jpg

1. Original coords list
2. *from tri2nb* from non-randomised x,y matrix
3. *from tri2nb* from a randomised coordinates list
4. *from dnearneigh* from the original coordinates list
5. from knearneigh with k=10

The help file on tri2nb recommended to do a row randomisation for some 
data, so i did that, although i assume that my first three x,y points 
should be easily triangled:

>coords[1:3,]
           x    y
  [1,] 0.835 8.75
  [2,] 0.835 8.25
  [3,] 2.505 8.75

knn plots totally varied depending on the chosen "k". I just cannot 
"feel" what the correct the k is for my purpose. I tested severals k 
between 2 and 30, and the moran.mc yielded all significant test 
statistics approaching 1 with decreasing neighbours
(which makes sense, i guess).

So, the question is, what it is the most approbiate neighbor list? The 
same question goes to the next step, the nb2listw and the style 
selection. Because pre-hoc, i cannot say if i want to weight something. 
All points are equally important.
I also do not know (yet) the underlying spatial process here. In the 
end, i need a robust indicator if my species data shows spatial 
autocorrelation or not.

Here is the code i used:

library(spdep)

/#apr.D is my x,y list; ap is my species matrix/

coords <- coordinates(apr.D)
coords.rand <-coords[sample(nrow(coords)),]

/#distances for dnearneigh/
ap.dis <- dist(coords)
min.dist <- min(unlist(dist(coords)))
max.dist <-  max(unlist(dist(coords)))

/#creating nb lists/
tri <- tri2nb(coords)
tri.rand <- tri2nb(coords.rand)
dnn <- dnearneigh(coords, min.dist, max.dist)
knn10 <- knearneigh(coords, k=10)

*I carried out a nb2listw  for each of the methods above with default 
options, and then a permutated moran I test on the first species in 
dataset, as it is not normally distributed:*

list1 <- nb2listw(tri)
list2 <- nb2listw(tri.rand)
list3 <- nb2listw(dnn)
list4 <- nb2listw(knn2nb(knn10))

moran.mc(ap[,1], list1, nsim=999)
moran.mc(ap[,1], list2, nsim=999)
moran.mc(ap[,1], list3, nsim=999)
moran.mc(ap[,1], list4, nsim=999)

*Here is the output*

>moran.mc(ap[,1], list1, nsim=999)

	Monte-Carlo simulation of Moran's I

data:  ap[, 1]
weights: list1  (tri)
number of simulations + 1: 1000

statistic = 0.8924, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

>moran.mc(ap[,1], list2, nsim=999)

	Monte-Carlo simulation of Moran's I

data:  ap[, 1]
weights: list2  (tri.rand)
number of simulations + 1: 1000

statistic = 0.0598, observed rank = 859, p-value = 0.141
alternative hypothesis: greater

>moran.mc(ap[,1], list3, nsim=999)

	Monte-Carlo simulation of Moran's I

data:  ap[, 1]
weights: list3  (dnn)
number of simulations + 1: 1000

statistic = -0.0358, observed rank = 1, p-value = 0.999
alternative hypothesis: greater

>moran.mc(ap[,1], list4, nsim=999)

	Monte-Carlo simulation of Moran's I

data:  ap[, 1]
weights: list4  (knn, k=10)
number of simulations + 1: 1000

statistic = 0.8194, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

As you can see, the results are different, and the randomisation had 
much impact on the triangled neighbour list. tir2nb and knn seem to be 
in agreement. Judging from the plots, the dnn-derived list seems to do 
many different distances, which might clog the analysis.  I tried all 
possible options of style to the nb2listw, and they didnt impact the 
effective result. So, the biggest question is, what would be the most 
approbiate
approach to create a nblist?

I really need some expert advise here. Thank you very much!

Tim

-

	[[alternative HTML version deleted]]