[R-sig-Geo] Seeking guidance for best application of spdep localG_perm
Cunningham, Angela
cunn|ngh@m@r @end|ng |rom orn|@gov
Wed Jul 3 12:24:41 CEST 2024
Hello all,
I am using spdep (via sfdep) for a cluster analysis of the rate of rare events. I am hoping you can provide some advice on how to apply these functions most appropriately. Specifically I am interested in any guidance about which significance calculation might be best in these circumstances, and which (if any) adjustment for multiple testing and spatial dependence (Bonferroni, FDR, etc) should be paired with the different p value calculations.
When running localG_perm(), three Pr values are returned: Pr(z != E(Gi)), Pr(z != E(Gi)) Sim, and Pr(folded) Sim. My understanding is that the first value is based on the mean and should only be used for normally distributed data, that the second uses a rank-percentile approach and is more robust, and that the last uses a Pysal-based calculation and may be quite sensitive. Is this correct? The second, Pr(z != E(Gi)) Sim, appears to be the most appropriate for my data situation; would you suggest otherwise?
The documentation for localG_perm states that "for inference, a Bonferroni-type test is suggested"; thus any adjustments for e.g. multiple testing must be made in a second step, such as with the p.adjust arguments in the hotspot() function, correct? Further, while fdr is the default for hotspot(), are there situations like having small numbers, a large number of simulations, or employing a particular Prname which would recommend a different p.adjust method?
Also, if I can bother you all with a very basic question: given that significance is determined through conditional permutation simulation, increasing the number of simulations should refine the results and make them more reliable, but unless a seed is set, I assume that is still always possible that results will change slightly across separate runs of a model, perhaps shifting an observation to either side of a threshold. Aside from computation time, are there other reasons to avoid increasing the number of simulations beyond a certain point? (It feels a bit like "p-hacking" to increase the nsim ad infinitum.) Are slight discrepancies in hot spot assignment between runs even with a large number of permutations to be expected? Is this particularly the case when working with small numbers?
Thank you for your time and consideration.
Angela R Cunningham, PhD
Spatial Demographer (R&D Associate)
Human Geography Group | Human Dynamics Section
Oak Ridge National Laboratory
Computational Sciences Building (5600), O401-29
1 Bethel Valley Road, Oak Ridge, TN 37830
cunninghamar using ornl.gov
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list