[R-sig-Geo] Seeking guidance for best application of spdep localG_perm

Josiah Parry jo@|@h@p@rry @end|ng |rom gm@||@com
Wed Jul 3 13:34:59 CEST 2024


This is all very well said! I would recommend using the percentile base
approach that Roger implemented. The Pysal folks are in the process of
adopting it (with a slight adjustment). I think it is the most “accurate”
p-value you will get from the functions today.

I don’t have a recommendation for the upper bound. But you do bring up a
good point about the classification of them. I don’t think I’m as qualified
to answer that !

On Wed, Jul 3, 2024 at 06:25 Cunningham, Angela via R-sig-Geo <
r-sig-geo using r-project.org> wrote:

> Hello all,
>
> I am using spdep (via sfdep) for a cluster analysis of the rate of rare
> events.  I am hoping you can provide some advice on how to apply these
> functions most appropriately. Specifically I am interested in any guidance
> about which significance calculation might be best in these circumstances,
> and which (if any) adjustment for multiple testing and spatial dependence
> (Bonferroni, FDR, etc) should be paired with the different p value
> calculations.
>
> When running localG_perm(), three Pr values are returned: Pr(z != E(Gi)),
> Pr(z != E(Gi)) Sim, and Pr(folded) Sim. My understanding is that the first
> value is based on the mean and should only be used for normally distributed
> data, that the second uses a rank-percentile approach and is more robust,
> and that the last uses a Pysal-based calculation and may be quite
> sensitive. Is this correct? The second, Pr(z != E(Gi)) Sim, appears to be
> the most appropriate for my data situation; would you suggest otherwise?
>
> The documentation for localG_perm states that "for inference, a
> Bonferroni-type test is suggested"; thus any adjustments for e.g. multiple
> testing must be made in a second step, such as with the p.adjust arguments
> in the hotspot() function, correct? Further, while fdr is the default for
> hotspot(), are there situations like having small numbers, a large number
> of simulations, or employing a particular Prname which would recommend a
> different p.adjust method?
>
> Also, if I can bother you all with a very basic question: given that
> significance is determined through conditional permutation simulation,
> increasing the number of simulations should refine the results and make
> them more reliable, but unless a seed is set, I assume that is still always
> possible that results will change slightly across separate runs of a model,
> perhaps shifting an observation to either side of a threshold. Aside from
> computation time, are there other reasons to avoid increasing the number of
> simulations beyond a certain point? (It feels a bit like "p-hacking" to
> increase the nsim ad infinitum.) Are slight discrepancies in hot spot
> assignment between runs even with a large number of permutations to be
> expected? Is this particularly the case when working with small numbers?
>
> Thank you for your time and consideration.
>
>
> Angela R Cunningham, PhD
> Spatial Demographer (R&D Associate)
> Human Geography Group | Human Dynamics Section
>
> Oak Ridge National Laboratory
> Computational Sciences Building (5600), O401-29
> 1 Bethel Valley Road, Oak Ridge, TN 37830
> <https://www.google.com/maps/search/1+Bethel+Valley+Road,+Oak+Ridge,+TN+37830?entry=gmail&source=g>
> cunninghamar using ornl.gov
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list