# [R-sig-Geo] Null distribution of a categorical variable to test local spatial association

Babak Naimi naimi.b at gmail.com
Thu Apr 7 14:00:30 CEST 2016

```Dear list,

I am exploring the solutions to generate a null distribution of a spatial
categorical variable (e.g., land use map). I am going to use the null
distribution in the procedure of testing whether local spatial association
(measured by a new statistics, named ELSA) at each location of the
categorical variable is significant, by simply using a non-parametric
bootstrap randomization approach to test ELSA against the null
distribution. The procedure is as follows:
1- we quantify Ei (ELSA at site i in the original data) at each location;
2- generate the null distribution; 3- using a Monte Carlo simulation with R
runs, we use a bootstrapping procedure through which a sample is drawn from
the null distribution at each run; 4- every time we quantify E*i (ELSA at
site i in the bootstrap sample); 5- test the number of runs that #(Ei >=
E*i) is valid; and finally 6- calculate a pseudo p-value using (1 + #(Ei >=
E*i) ) / (R + 1)
This is a straightforward approach that is also used in significance
testing based on other indices, (e.g., Local Moran's I).

The problem I have is how to generate the null distribution. I know that
the approach has been used in the other studies, is based on shuffling the
spatial locations randomly to get the null distribution, but to me it does
not make sense as it can cause a problem for the cases with uneven
distribution on events (different classes in the categorical maps). For
example, imaging a map with two categories A and B, and the frequency of
class A is 95% of the entire study area while class B can be seen in only
5% of the area. If we shuffle the locations randomly, do we really get a
null distribution of the classes? or the supposedly null distribution has a
high chance of still be spatially autocorrelated for class A?

The solution I figured out so far, is to assume in the null hypothesis that
the events are distributed uniformly and independently in the given
locations, and thus, the null distribution can be constructed by drawing a
sample at each location from the events (m categorical classes: c1....cm)
each having a probability of 1/m.

However, since I am not a statistician, I was wondering if this can be a
right solution. And if so, whether there is a difference between using this
approach for a real categorical variable, or a discretized variable from a
continuous variable.

Best regards,
Babak

------------
Babak Naimi
Center for Macroecology, Evolution, and Climate (CMEC),
University of Copenhagen, Denmark
Website: www.r-gis.net

[[alternative HTML version deleted]]

```