[R-sig-Geo] Beginner's question on choosing the correct test, part.2
Roger Bivand
Roger.Bivand at nhh.no
Fri Aug 22 21:31:01 CEST 2014
On Fri, 22 Aug 2014, Tim Richter-Heitmann wrote:
> Excuse my beginner level-questions.
> I am stuck at the beginning, i.e. choosing the correct neighbor lists. I
> tried knearneigh (with several "k"s), tri2nb, and dnearneigh. Yesterday,
> i was nicely shown that in some datasets the nblist creation process is
> not always impactful, but it seems
> for my data, it does matter.
>
You really need to consider more carefully which spatial processes are
thought of as being present. Your assumption that using lattice-based
approaches to possible spatial autocorrelation is sensible needs more
thought.
>
> My original plot consists of 59 points semi-regularily distributed on a
> 10x10m area.
The area in question is 10x10m. What is the support of the observations
(see Gotway & Young 2002 for definitions)? Are the observation points
chosen by sampling, or are they representative points for a tesselation of
the area? If they can be seen as a tesselation, they represent spatial
objects, each with different observed levels of the variables of interest
(are these counts of bacteria)? If the tesselation view is relevant, then
a lattice-based view of autocorrelation using neighbours may make sense.
If, on the other hand, the observation points are locations from which
samples have been taken (and samples could have been taken at other
locations), we may rather be looking not at spatial objects but at a
spatial field, with continuous spatial processes for which the idea of
neighbours is less "natural". In this case, examining a variogram of the
variable of interest, possibly qualified by background variables (a mean
model) - this consideration applies to the lattice case too, may be more
sensible.
If your data are more like observations from a field, where their placing
across that continuous field could have been different, a geostatistical
approach may make more sense. If they are lattice based with a tesselated
form - think trial plots - then it is much easier to conceptualise
neighbours.
A further open question is whether your variables of interest are
presence/absence, count, or well-behaved hump-shaped continuous values.
Thw question of spatially structured random effects in modelling can be
addressed in many ways, but I'm not convinced here that you cannot choose
another approach, given that there is no obvious "neighbour" relationship.
In addition, you have a time component, which I think suggests that the
lattice approach may not be very helpful, especially as I think you
said that the observed locations may change over time. I don't think that
there is a quick fix for this.
Roger
> Points are allocated, that some points have two absolute
> nearest neighbours, others just one.
> The min distance is 50 cm, the highest is about 12m.
> My question is if bacterial communities sampled at these plots follow
> non-random spatial processes. I will repeat that for every abundant
> species in each population.
>
> Here are nb plots (from top to bottom):
>
> http://s2.postimg.org/tkxqw82ih/image.jpg
>
> 1. Original coords list
> 2. *from tri2nb* from non-randomised x,y matrix
> 3. *from tri2nb* from a randomised coordinates list
> 4. *from dnearneigh* from the original coordinates list
> 5. from knearneigh with k=10
>
>
> The help file on tri2nb recommended to do a row randomisation for some
> data, so i did that, although i assume that my first three x,y points
> should be easily triangled:
>
>> coords[1:3,]
> x y
> [1,] 0.835 8.75
> [2,] 0.835 8.25
> [3,] 2.505 8.75
>
>
> knn plots totally varied depending on the chosen "k". I just cannot
> "feel" what the correct the k is for my purpose. I tested severals k
> between 2 and 30, and the moran.mc yielded all significant test
> statistics approaching 1 with decreasing neighbours
> (which makes sense, i guess).
>
> So, the question is, what it is the most approbiate neighbor list? The
> same question goes to the next step, the nb2listw and the style
> selection. Because pre-hoc, i cannot say if i want to weight something.
> All points are equally important.
> I also do not know (yet) the underlying spatial process here. In the
> end, i need a robust indicator if my species data shows spatial
> autocorrelation or not.
>
> Here is the code i used:
>
> library(spdep)
>
> /#apr.D is my x,y list; ap is my species matrix/
>
> coords <- coordinates(apr.D)
> coords.rand <-coords[sample(nrow(coords)),]
>
> /#distances for dnearneigh/
> ap.dis <- dist(coords)
> min.dist <- min(unlist(dist(coords)))
> max.dist <- max(unlist(dist(coords)))
>
> /#creating nb lists/
> tri <- tri2nb(coords)
> tri.rand <- tri2nb(coords.rand)
> dnn <- dnearneigh(coords, min.dist, max.dist)
> knn10 <- knearneigh(coords, k=10)
>
> *I carried out a nb2listw for each of the methods above with default
> options, and then a permutated moran I test on the first species in
> dataset, as it is not normally distributed:*
>
> list1 <- nb2listw(tri)
> list2 <- nb2listw(tri.rand)
> list3 <- nb2listw(dnn)
> list4 <- nb2listw(knn2nb(knn10))
>
> moran.mc(ap[,1], list1, nsim=999)
> moran.mc(ap[,1], list2, nsim=999)
> moran.mc(ap[,1], list3, nsim=999)
> moran.mc(ap[,1], list4, nsim=999)
>
> *Here is the output*
>
>> moran.mc(ap[,1], list1, nsim=999)
>
> Monte-Carlo simulation of Moran's I
>
> data: ap[, 1]
> weights: list1 (tri)
> number of simulations + 1: 1000
>
> statistic = 0.8924, observed rank = 1000, p-value = 0.001
> alternative hypothesis: greater
>
>> moran.mc(ap[,1], list2, nsim=999)
>
> Monte-Carlo simulation of Moran's I
>
> data: ap[, 1]
> weights: list2 (tri.rand)
> number of simulations + 1: 1000
>
> statistic = 0.0598, observed rank = 859, p-value = 0.141
> alternative hypothesis: greater
>
>> moran.mc(ap[,1], list3, nsim=999)
>
> Monte-Carlo simulation of Moran's I
>
> data: ap[, 1]
> weights: list3 (dnn)
> number of simulations + 1: 1000
>
> statistic = -0.0358, observed rank = 1, p-value = 0.999
> alternative hypothesis: greater
>
>> moran.mc(ap[,1], list4, nsim=999)
>
> Monte-Carlo simulation of Moran's I
>
> data: ap[, 1]
> weights: list4 (knn, k=10)
> number of simulations + 1: 1000
>
> statistic = 0.8194, observed rank = 1000, p-value = 0.001
> alternative hypothesis: greater
>
>
>
>
> As you can see, the results are different, and the randomisation had
> much impact on the triangled neighbour list. tir2nb and knn seem to be
> in agreement. Judging from the plots, the dnn-derived list seems to do
> many different distances, which might clog the analysis. I tried all
> possible options of style to the nb2listw, and they didnt impact the
> effective result. So, the biggest question is, what would be the most
> approbiate
> approach to create a nblist?
>
>
> I really need some expert advise here. Thank you very much!
>
> Tim
>
>
>
>
>
> -
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list