[R-sig-Geo] Very dissimilar results with Moran's I and EBI

Fabricio Vasselai fabriciovasselai at gmail.com
Sun Apr 5 21:35:28 CEST 2015


Dears professor Bivand and Richard,

I just saw these emails today. Although I don't know exactly which was
the authors' intended formula (it is, whether it has a typo in the
paper or not), I can say the following. A couple of years ago, after a
course I took with professor Bivand, I contacted the EBI's authors
asking about whether there was a defined closed range of values for
the EBI, such as Moran's I has the [-1,1] range for row standardized W
matrices. Because I was getting some really high values for EBI, even
when using row standardized weight matrices.

Professor Assunção replied me that he didn't look yet at that
particular question, but he also pointed specifically that an
empirical Bayes estimate is always between the crude rate and the
global average rate, which he claimed that could possibly imply that
Moran's I based on bayesian rates should also be contained between the
bounds of the usual Moran (depending on the weight matrix). The things
is, it seems to me that this scenario he was talking about can only be
true in case we subtract the mean of the EBI smoothed rates, as
professor Bivand has mentioned. Which could suggest that there might
really be a typo in the paper.

In any case I contacted professor Assunção again last week to direct
the question to him. I recommend Richard to do the same. Let's see
what reply we might get. In any case, I will be eagerly waiting for
that new option to the EBImoran.mc, function to permit the subtraction
of the mean smoothed rate. Is there any approximate date for that to
become available? Since I've been working with comparisons between
indices, that could make for an interesting play for my research....

Best regards,

Fabricio Vasselai
Political Science Department
University of São Paulo


2015-03-23 7:43 GMT-03:00 Roger Bivand <Roger.Bivand at nhh.no>:
>
> On Mon, 23 Mar 2015, Roger Bivand wrote:
>
> Reason for difference inline below:
>
>> On Mon, 23 Mar 2015, Richard Asturia wrote:
>>
>>>  Dear list,
>>>
>>>  I have used a colleague's data to compare geographical clustering using
>>>  different ways of detecting clustering. Today, I found very dissimilar
>>>  results for the statistic of Moran's I and of the EBI version of Moran's I
>>>  (using spdep's EBImoran.mc), what I understand was not supposed to be the
>>>  case. In fact, with EBI it was twice as big (1.23 against 0.69).
>>>
>>>  The issue is that the EBI statistic was supposed to be similar to Moran's
>>>  I
>>>  in what regards the value of Z, while not necessarily in what regards the
>>>  p-value.
>>>
>>>  I would like to ask your advice on what seems to be going wrong. In order
>>>  to give a reproducible example without having to make you download my
>>>  shapefile, I wrote a code that creates precisely the same neighborhood
>>>  structure of the map I am using.
>>
>>
>> Thanks for trying to provide an example. Unfortunately, you posted HTML, which makes it very hard to copy and paste from any email client. The correct way to move an nb object would be:
>>
>> nb <- structure(list(21L, c(8L, 9L, 11L, 18L, 20L, 22L, 25L), c(10L,
>> 14L, 26L), c(8L, 25L), c(6L, 12L, 15L, 23L, 26L), c(5L, 12L,
>> 13L, 18L, 27L), c(13L, 16L, 18L, 24L), c(2L, 4L, 20L, 25L), c(2L,
>> 11L, 16L, 17L, 20L, 21L), c(3L, 14L, 28L), c(2L, 9L, 16L, 18L
>> ) , c(5L, 6L, 13L, 23L), c(6L, 7L, 12L, 18L, 19L, 23L, 24L, 27L
>> ) , c(3L, 10L, 26L, 28L), c(5L, 26L), c(7L, 9L, 11L, 18L, 21L),
>>     c(9L, 20L, 21L), c(2L, 6L, 7L, 11L, 13L, 16L, 22L, 27L),
>>     c(13L, 23L, 24L, 26L), c(2L, 8L, 9L, 17L, 21L), c(1L, 9L,
>>     16L, 17L, 20L), c(2L, 18L, 25L), c(5L, 12L, 13L, 19L, 26L
>>    ) , c(7L, 13L, 19L), c(2L, 4L, 8L, 22L), c(3L, 5L, 14L, 15L,
>>    19L, 23L), c(6L, 13L, 18L), c(10L, 14L)), class = "nb", region.id = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
>> "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
>> "25", "26", "27", "28"), call = NA, sym = TRUE)
>>
>> constructed by using dput() - so using spaces permitting line breaks, and for the others:
>>
>> cases <- c(5, 55, 6, 14, 22, 2, 8, 159, 158, 0, 15, 85, 21, 8, 3, 27,
>> 1128, 18, 16, 101, 72, 14, 1, 55, 18, 6, 0, 4)
>>
>> pop <- c(865, 4534, 9708, 1799, 4929, 2154, 4159, 10414, 10917, 1420,
>> 1166, 26098, 8417, 3049, 1853, 2419, 49108, 2982, 4218, 6824,
>> 6923, 1961, 1259, 13661, 2418, 3672, 811, 1375)
>>
>> and
>>
>> my.listW <- nb2listw(nb, style="W")
>>
>> Could you please say why you expect the values of I to be similar? Is this claimed in:
>>
>> Assunção RM, Reis EA 1999 A new proposal to adjust Moran's I for
>> population density. Statistics in Medicine 18, pp. 2147-2162
>
>
> On p. 2157, EBI is n/S0 * sum(z_i*lag(z_i))/sum((z_i - \bar(z))^2)
>
> so sum(z_i*lag(z_i)) is implemented as it stands. If we change this to differences from the mean of z:
>
> sum((z_i - \bar(z))*(lag(z_i - \bar(z))))
>
> we get the same result as moran() on the EB smoothed rates. So the question is whether subtracting the mean of the EB smoothed rates in the numerator was intended but that there is a misprint in the paper, or whether it was intentionally omitted. The implementation follows the printed formula.
>
> I'll add an option to the function to permit the subtraction of the mean smoothed rate.
>
> Roger
>
>>
>> Running
>>
>> summary(glm(cases ~ 1 + offset(log(pop)), family=quasipoisson))
>>
>> suggests that the data are overdispersed, so the assumptions for empirical Bayes adjustment may not be met. For Negative Binomial alternatives, see the DCluster package.
>>
>> Hope this helps,
>>
>> Roger
>>
>> PS. It would be helpful to know why you are asking, and very helpful to have an affiliation in your signature. At least you use a real name!
>>
>>>
>>>  Any advice will be very much appreciated.
>>>
>>>  Thanks in advance,
>>>
>>>  RA
>>>
>>>
>>> # THE CODE:
>>> # obviously loading the package:
>>>
>>>  require(spdep)
>>>
>>>  #this big messy piece of code recreates in matrix form the neighborhood
>>>  structure:
>>>  neighb.structure
>>>  <-matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.142857142857143,0.142857142857143,0,0.142857142857143,0,0,0,0,0,0,0.142857142857143,0,0.142857142857143,0,0.142857142857143,0,0,0.142857142857143,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0.2,0,0,0.2,0,0,0,0,0,0,0,0.2,0,0,0.2,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0.2,0,0,0,0,0.2,0,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0.25,0,0.25,0,0,0,0,0,0.25,0,0,0,0,0,0.25,0,0.25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0.25,0,0,0,0,0.166666666666667,0,0,0,0,0,0,0,0,0.166666666666667,0,0,0,0,0.166666666666667,0.166666666666667,0,0,0.166666666666667,0.166666666666667,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0.25,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0.25,0,0.25,0,0,0,0,0,0,0,0
>>>  ,0,!
>>>  0,0,0,0,0,0.25,0.25,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0,0.125,0.125,0,0,0,0,0.125,0,0,0,0,0,0.125,0.125,0,0,0,0.125,0.125,0,0,0.125,0,0,0,0.25,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0.25,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0,0,0,0,0,0.2,0,0.2,0,0.2,0,0,0,0,0,0,0.2,0,0,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0.333333333333333,0,0,0,0,0,0,0,0,0.125,0,0,0,0.125,0.125,0,0,0,0.125,0,0.125,0,0,0.125,0,0,0,0,0,0.125,0,0,0,0,0.125,0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0,0.25,0.25,0,0.25,0,0,0,0.2,0,0,0,0,0,0.2,0.2,0,0,0,0,0,0,0,0.2,0,0,0,0.2,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0.2,0,0,0.2,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0.2,0,0,0,0,0,0.2,0,0,0,0,0,0,0.2,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0.33
>>>  33!
>>>  33333333333,0,0,0,0,0,0,0,0,0,0,0.25,0,0.25,0,0,0,0.25,0,0,0,0,0,0,0,0
>>>  ,0,0,0,0,0,0.25,0,0,0,0,0,0,0,0,0.166666666666667,0,0.166666666666667,0,0,0,0,0,0,0,0,0.166666666666667,0.166666666666667,0,0,0,0.166666666666667,0,0,0,0.166666666666667,0,0,0,0,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0.333333333333333,0,0,0,0,0.333333333333333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5,0,0,0,0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0),28,28)
>>>
>>>  #recreating the absolute value of the cases of interest in each feature of
>>>  the map:
>>>  cases <-
>>>  c(5,55,6,14,22,2,8,159,158,0,15,85,21,8,3,27,1128,18,16,101,72,14,1,55,18,6,0,4)
>>>
>>>  #recreating  the absolute value of the population in each feature of the
>>>  map:
>>>  pop <-
>>>  c(865,4534,9708,1799,4929,2154,4159,10414,10917,1420,1166,26098,8417,3049,1853,2419,49108,2982,4218,6824,6923,1961,1259,13661,2418,3672,811,1375)
>>>
>>>  #recreating the rate values in each feature of the map:
>>>  rates <- cases/pop
>>>
>>>  #generating a listW:
>>>  neighb.structure <- t(neighb.structure)
>>>  row.names(neighb.structure) <-
>>>  my.listW <- mat2listw(neighb.structure)
>>>
>>>  #running the tests:
>>>  moran.test(x=rates, listw=my.listW)
>>>  moran.mc(x=rates, listw=my.listW, nsim=999)
>>>  EBImoran.mc(n=cases, x=pop, listw=my.listW, nsim=999)
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>>  _______________________________________________
>>>  R-sig-Geo mailing list
>>>  R-sig-Geo at r-project.org
>>>  https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 91 00
> e-mail: Roger.Bivand at nhh.no
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list