[R-sig-Geo] Beginner's question on choosing the correct test, part.2

Fri Aug 22 21:31:01 CEST 2014

On Fri, 22 Aug 2014, Tim Richter-Heitmann wrote:

> Excuse my beginner level-questions.
> I am stuck at the beginning, i.e. choosing the correct neighbor lists. I
> tried knearneigh (with several "k"s), tri2nb, and dnearneigh. Yesterday,
> i was nicely shown that in some datasets the nblist creation process is
> not always impactful, but it seems
> for my data, it does matter.
>

You really need to consider more carefully which spatial processes are 
thought of as being present. Your assumption that using lattice-based 
approaches to possible spatial autocorrelation is sensible needs more 
thought.

>
> My original plot consists of 59 points semi-regularily distributed on a
> 10x10m area.

The area in question is 10x10m. What is the support of the observations 
(see Gotway & Young 2002 for definitions)? Are the observation points 
chosen by sampling, or are they representative points for a tesselation of 
the area? If they can be seen as a tesselation, they represent spatial 
objects, each with different observed levels of the variables of interest 
(are these counts of bacteria)? If the tesselation view is relevant, then 
a lattice-based view of autocorrelation using neighbours may make sense.

If, on the other hand, the observation points are locations from which 
samples have been taken (and samples could have been taken at other 
locations), we may rather be looking not at spatial objects but at a 
spatial field, with continuous spatial processes for which the idea of 
neighbours is less "natural". In this case, examining a variogram of the 
variable of interest, possibly qualified by background variables (a mean 
model) - this consideration applies to the lattice case too, may be more 
sensible.

If your data are more like observations from a field, where their placing 
across that continuous field could have been different, a geostatistical 
approach may make more sense. If they are lattice based with a tesselated 
form - think trial plots - then it is much easier to conceptualise 
neighbours.

A further open question is whether your variables of interest are 
presence/absence, count, or well-behaved hump-shaped continuous values.

Thw question of spatially structured random effects in modelling can be 
addressed in many ways, but I'm not convinced here that you cannot choose 
another approach, given that there is no obvious "neighbour" relationship. 
In addition, you have a time component, which I think suggests that the 
lattice approach may not be very helpful, especially as I think you 
said that the observed locations may change over time. I don't think that 
there is a quick fix for this.

Roger

> Points are allocated, that some points have two absolute
> nearest neighbours, others just one.
> The min distance is 50 cm, the highest is about 12m.
> My question is if bacterial communities sampled at these plots follow
> non-random spatial processes. I will repeat that for every abundant
> species in each population.
>
> Here are nb plots (from top to bottom):
>
> http://s2.postimg.org/tkxqw82ih/image.jpg
>
> 1. Original coords list
> 2. *from tri2nb* from non-randomised x,y matrix
> 3. *from tri2nb* from a randomised coordinates list
> 4. *from dnearneigh* from the original coordinates list
> 5. from knearneigh with k=10
>
>
> The help file on tri2nb recommended to do a row randomisation for some
> data, so i did that, although i assume that my first three x,y points
> should be easily triangled:
>
>> coords[1:3,]
>           x    y
>  [1,] 0.835 8.75
>  [2,] 0.835 8.25
>  [3,] 2.505 8.75
>
>
> knn plots totally varied depending on the chosen "k". I just cannot
> "feel" what the correct the k is for my purpose. I tested severals k
> between 2 and 30, and the moran.mc yielded all significant test
> statistics approaching 1 with decreasing neighbours
> (which makes sense, i guess).
>
> So, the question is, what it is the most approbiate neighbor list? The
> same question goes to the next step, the nb2listw and the style
> selection. Because pre-hoc, i cannot say if i want to weight something.
> All points are equally important.
> I also do not know (yet) the underlying spatial process here. In the
> end, i need a robust indicator if my species data shows spatial
> autocorrelation or not.
>
> Here is the code i used:
>
> library(spdep)
>
> /#apr.D is my x,y list; ap is my species matrix/
>
> coords <- coordinates(apr.D)
> coords.rand <-coords[sample(nrow(coords)),]
>
> /#distances for dnearneigh/
> ap.dis <- dist(coords)
> min.dist <- min(unlist(dist(coords)))
> max.dist <-  max(unlist(dist(coords)))
>
> /#creating nb lists/
> tri <- tri2nb(coords)
> tri.rand <- tri2nb(coords.rand)
> dnn <- dnearneigh(coords, min.dist, max.dist)
> knn10 <- knearneigh(coords, k=10)
>
> *I carried out a nb2listw  for each of the methods above with default
> options, and then a permutated moran I test on the first species in
> dataset, as it is not normally distributed:*
>
> list1 <- nb2listw(tri)
> list2 <- nb2listw(tri.rand)
> list3 <- nb2listw(dnn)
> list4 <- nb2listw(knn2nb(knn10))
>
> moran.mc(ap[,1], list1, nsim=999)
> moran.mc(ap[,1], list2, nsim=999)
> moran.mc(ap[,1], list3, nsim=999)
> moran.mc(ap[,1], list4, nsim=999)
>
> *Here is the output*
>
>> moran.mc(ap[,1], list1, nsim=999)
>
> 	Monte-Carlo simulation of Moran's I
>
> data:  ap[, 1]
> weights: list1  (tri)
> number of simulations + 1: 1000
>
> statistic = 0.8924, observed rank = 1000, p-value = 0.001
> alternative hypothesis: greater
>
>> moran.mc(ap[,1], list2, nsim=999)
>
> 	Monte-Carlo simulation of Moran's I
>
> data:  ap[, 1]
> weights: list2  (tri.rand)
> number of simulations + 1: 1000
>
> statistic = 0.0598, observed rank = 859, p-value = 0.141
> alternative hypothesis: greater
>
>> moran.mc(ap[,1], list3, nsim=999)
>
> 	Monte-Carlo simulation of Moran's I
>
> data:  ap[, 1]
> weights: list3  (dnn)
> number of simulations + 1: 1000
>
> statistic = -0.0358, observed rank = 1, p-value = 0.999
> alternative hypothesis: greater
>
>> moran.mc(ap[,1], list4, nsim=999)
>
> 	Monte-Carlo simulation of Moran's I
>
> data:  ap[, 1]
> weights: list4  (knn, k=10)
> number of simulations + 1: 1000
>
> statistic = 0.8194, observed rank = 1000, p-value = 0.001
> alternative hypothesis: greater
>
>
>
>
> As you can see, the results are different, and the randomisation had
> much impact on the triangled neighbour list. tir2nb and knn seem to be
> in agreement. Judging from the plots, the dnn-derived list seems to do
> many different distances, which might clog the analysis.  I tried all
> possible options of style to the nb2listw, and they didnt impact the
> effective result. So, the biggest question is, what would be the most
> approbiate
> approach to create a nblist?
>
>
> I really need some expert advise here. Thank you very much!
>
> Tim
>
>
>
>
>
> -
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no