[R-sig-Geo] Very dissimilar results with Moran's I and EBI

Mon Mar 23 11:43:38 CET 2015

On Mon, 23 Mar 2015, Roger Bivand wrote:

Reason for difference inline below:

> On Mon, 23 Mar 2015, Richard Asturia wrote:
>
>>  Dear list,
>>
>>  I have used a colleague's data to compare geographical clustering using
>>  different ways of detecting clustering. Today, I found very dissimilar
>>  results for the statistic of Moran's I and of the EBI version of Moran's I
>>  (using spdep's EBImoran.mc), what I understand was not supposed to be the
>>  case. In fact, with EBI it was twice as big (1.23 against 0.69).
>>
>>  The issue is that the EBI statistic was supposed to be similar to Moran's
>>  I
>>  in what regards the value of Z, while not necessarily in what regards the
>>  p-value.
>>
>>  I would like to ask your advice on what seems to be going wrong. In order
>>  to give a reproducible example without having to make you download my
>>  shapefile, I wrote a code that creates precisely the same neighborhood
>>  structure of the map I am using.
>
> Thanks for trying to provide an example. Unfortunately, you posted HTML, 
> which makes it very hard to copy and paste from any email client. The correct 
> way to move an nb object would be:
>
> nb <- structure(list(21L, c(8L, 9L, 11L, 18L, 20L, 22L, 25L), c(10L,
> 14L, 26L), c(8L, 25L), c(6L, 12L, 15L, 23L, 26L), c(5L, 12L,
> 13L, 18L, 27L), c(13L, 16L, 18L, 24L), c(2L, 4L, 20L, 25L), c(2L,
> 11L, 16L, 17L, 20L, 21L), c(3L, 14L, 28L), c(2L, 9L, 16L, 18L
> ) , c(5L, 6L, 13L, 23L), c(6L, 7L, 12L, 18L, 19L, 23L, 24L, 27L
> ) , c(3L, 10L, 26L, 28L), c(5L, 26L), c(7L, 9L, 11L, 18L, 21L),
>     c(9L, 20L, 21L), c(2L, 6L, 7L, 11L, 13L, 16L, 22L, 27L),
>     c(13L, 23L, 24L, 26L), c(2L, 8L, 9L, 17L, 21L), c(1L, 9L,
>     16L, 17L, 20L), c(2L, 18L, 25L), c(5L, 12L, 13L, 19L, 26L
>    ) , c(7L, 13L, 19L), c(2L, 4L, 8L, 22L), c(3L, 5L, 14L, 15L,
>    19L, 23L), c(6L, 13L, 18L), c(10L, 14L)), class = "nb", region.id = 
> c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
> "25", "26", "27", "28"), call = NA, sym = TRUE)
>
> constructed by using dput() - so using spaces permitting line breaks, and for 
> the others:
>
> cases <- c(5, 55, 6, 14, 22, 2, 8, 159, 158, 0, 15, 85, 21, 8, 3, 27,
> 1128, 18, 16, 101, 72, 14, 1, 55, 18, 6, 0, 4)
>
> pop <- c(865, 4534, 9708, 1799, 4929, 2154, 4159, 10414, 10917, 1420,
> 1166, 26098, 8417, 3049, 1853, 2419, 49108, 2982, 4218, 6824,
> 6923, 1961, 1259, 13661, 2418, 3672, 811, 1375)
>
> and
>
> my.listW <- nb2listw(nb, style="W")
>
> Could you please say why you expect the values of I to be similar? Is this 
> claimed in:
>
> Assunção RM, Reis EA 1999 A new proposal to adjust Moran's I for
> population density. Statistics in Medicine 18, pp. 2147-2162

On p. 2157, EBI is n/S0 * sum(z_i*lag(z_i))/sum((z_i - \bar(z))^2)

so sum(z_i*lag(z_i)) is implemented as it stands. If we change this to 
differences from the mean of z:

sum((z_i - \bar(z))*(lag(z_i - \bar(z))))

we get the same result as moran() on the EB smoothed rates. So the 
question is whether subtracting the mean of the EB smoothed rates in the 
numerator was intended but that there is a misprint in the paper, or 
whether it was intentionally omitted. The implementation follows the 
printed formula.

I'll add an option to the function to permit the subtraction of the mean 
smoothed rate.

Roger

>
> Running
>
> summary(glm(cases ~ 1 + offset(log(pop)), family=quasipoisson))
>
> suggests that the data are overdispersed, so the assumptions for empirical 
> Bayes adjustment may not be met. For Negative Binomial alternatives, see the 
> DCluster package.
>
> Hope this helps,
>
> Roger
>
> PS. It would be helpful to know why you are asking, and very helpful to have 
> an affiliation in your signature. At least you use a real name!
>
>>
>>  Any advice will be very much appreciated.
>>
>>  Thanks in advance,
>>
>>  RA
>> 
>> 
>> # THE CODE:
>> # obviously loading the package:
>>  require(spdep)
>>
>>  #this big messy piece of code recreates in matrix form the neighborhood
>>  structure:
>>  neighb.structure
>>  <-matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.142857142857143,0.142857142857143,0,0.142857142857143,0,0,0,0,0,0,0.142857142857143,0,0.142857142857143,0,0.142857142857143,0,0,0.142857142857143,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0.2,0,0,0.2,0,0,0,0,0,0,0,0.2,0,0,0.2,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0.2,0,0,0,0,0.2,0,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0.25,0,0.25,0,0,0,0,0,0.25,0,0,0,0,0,0.25,0,0.25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0.25,0,0,0,0,0.166666666666667,0,0,0,0,0,0,0,0,0.166666666666667,0,0,0,0,0.166666666666667,0.166666666666667,0,0,0.166666666666667,0.166666666666667,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0.25,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0.25,0,0.25,0,0,0,0,0,0,0,0
>>  ,0,!
>>  0,0,0,0,0,0.25,0.25,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0,0.125,0.125,0,0,0,0,0.125,0,0,0,0,0,0.125,0.125,0,0,0,0.125,0.125,0,0,0.125,0,0,0,0.25,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0.25,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0.2,0,0.2,0,0.2,0,0,0,0,0,0,0.2,0,0,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0.333333333333333,0,0,0,0,0,0,0,0,0.125,0,0,0,0.125,0.125,0,0,0,0.125,0,0.125,0,0,0.125,0,0,0,0,0,0.125,0,0,0,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0.25,0.25,0,0.25,0,0,0,0.2,0,0,0,0,0,0.2,0.2,0,0,0,0,0,0,0,0.2,0,0,0,0.2,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0.2,0,0,0.2,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0.2,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0.33
>>  33!
>>  33333333333,0,0,0,0,0,0,0,0,0,0,0.25,0,0.25,0,0,0,0.25,0,0,0,0,0,0,0,0
>>  ,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0.166666666666667,0,0.166666666666667,0,0,0,0,0,0,0,0,0.166666666666667,0.166666666666667,0,0,0,0.166666666666667,0,0,0,0.166666666666667,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0),28,28)
>>
>>  #recreating the absolute value of the cases of interest in each feature of
>>  the map:
>>  cases <-
>>  c(5,55,6,14,22,2,8,159,158,0,15,85,21,8,3,27,1128,18,16,101,72,14,1,55,18,6,0,4)
>>
>>  #recreating  the absolute value of the population in each feature of the
>>  map:
>>  pop <-
>>  c(865,4534,9708,1799,4929,2154,4159,10414,10917,1420,1166,26098,8417,3049,1853,2419,49108,2982,4218,6824,6923,1961,1259,13661,2418,3672,811,1375)
>>
>>  #recreating the rate values in each feature of the map:
>>  rates <- cases/pop
>>
>>  #generating a listW:
>>  neighb.structure <- t(neighb.structure)
>>  row.names(neighb.structure) <-
>>  my.listW <- mat2listw(neighb.structure)
>>
>>  #running the tests:
>>  moran.test(x=rates, listw=my.listW)
>>  moran.mc(x=rates, listw=my.listW, nsim=999)
>>  EBImoran.mc(n=cases, x=pop, listw=my.listW, nsim=999)
>>
>>   [[alternative HTML version deleted]]
>>
>>  _______________________________________________
>>  R-sig-Geo mailing list
>>  R-sig-Geo at r-project.org
>>  https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>> 
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no