[R-sig-Geo] credible interval for empirical Bayesian estimates of rates

Fri Apr 24 22:12:18 CEST 2020

Wonderful thank you.

GeoDA and the spdep::EBest are providing the same estimates for my
datasets. Thanks for the reprex example with the GUI via ">". That is a
helpful convention I'll use in the future..

Ok great, there is not yet a CI implementation. If I can manage one I'd be
delighted to share and contribute it.

Julie Silge: https://juliasilge.com/blog/bayesian-blues/

and David Robinson
http://varianceexplained.org/r/credible_intervals_baseball/

have some suggestions that are similar to each other's (no surprise there,
they work together a lot).

Thanks for the suggestions about alternative model specifications and
including covariates.

This has been very helpful, I appreciate this list immensely,
Dexter

On Fri, Apr 24, 2020 at 3:05 PM Roger Bivand <Roger.Bivand using nhh.no> wrote:

> On Fri, 24 Apr 2020, Dexter Locke wrote:
>
> > Thanks Roger and list.
> >
> > I didn't think a repex was needed because a question was: why does
> > spdep::EBest(counts, population, family = 'binomial') give the same
> > results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa
> > calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think
> > this is a question about terminology. The same results were achieved with
> > the packages while naming the model differently - why?
>
> Reproducible example:
>
> auckland <- st_read(system.file("shapes/auckland.shp",
>    package="spData")[1], quiet=TRUE)
> res <- EBest(auckland$M77_85, 9*auckland$Und5_81)
> res0 <- EBest(auckland$M77_85, 9*auckland$Und5_81, family="binomial")
>
> > summary(res$estmm)
>      Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
> > summary(res0$estmm)
>      Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> 0.001484 0.002235 0.002549 0.002648 0.002968 0.004536
>
> After calculating Und5_81 * 9 as a new column, running GeoDa -> Map ->
> Rates-Caluculated Map -> Empirical Bayes and exporting a shapefile, in R I
> see:
>
> > summary(auck$R_EBS)
>      Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
>
> which is the same as the Poisson, and:
>
> > all.equal(res$estmm, auck$R_EBS)
> [1] TRUE
>
> GeoDa is providing the same EB Poisson as EBest, isn't it? If yours
> differ, are both implementations seeing the same data?
>
> >
> > Yes ?spdep::EBest directed me to the literature I'm struggling to access.
> > And Yes, I've been looking at the raw code and understand how the estmm
> is
> > generated.
> >
> > I've been using the epitools::pois.exact() and spdep::EBest. I can
> compare
> > the point estimates from pois.exact to those provided by EBest, but I'd
> > like to graph side by side their credible / confidence intervals.
> >
> > Its this last part on the credible intervals I'm interested in. How to
> get
> > credible intervals around estmm? This is my main question.
>
> EBest() was written to implement Bailey & Gatrell's textbook, which did
> not provide CI, and just used the Marshall Auckland dataset. If you'd like
> to implement them, I'd welcome a contribution.
>
> >
> > ASDAR is a reference I'm using all the time. Thanks for that gem.
> >
> > DCluster::empbaysmooth also does not provide a credible interval, either.
> >
>
> As you see from ch. 10, CI are described for the epitools implementation.
> My feeling is that the literature moved away from simple EB rates towards
> IID RE and spatially structured RE, with relevant covariates, say like the
> classic Scottish Lip Cancer dataset, and currently CARBayes is a solid
> package among many others. Simply using base population becomes too
> unsatisfactory. PHE uses funnel plots which do have CI of a kind, to draw
> attention from small base populations:
> https://nhsrcommunity.com/blog/introduction-to-funnel-plots/ which can be
> used to adjust class intervals for mapping.
>
> Roger
>
> > -Dexter
> > http://dexterlocke.com/
> >
> >
> >
> > On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <Roger.Bivand using nhh.no>
> wrote:
> >
> >> On Fri, 24 Apr 2020, Dexter Locke wrote:
> >>
> >>> Dear esteemed list,
> >>>
> >>> I'm using spdep::EBest with family = 'binomial' for counts of events
> >> within
> >>> polygons that have an 'at risk' population. The resultant "estmm" is
> >>> 'shrunk' compared to the raw rate (both given by EBest and calculated
> "by
> >>> hand" rate. All good there.
> >>>
> >>> Using GeoDa version 1.14.0 24 August 2019 produces identical results
> for
> >>> its Empirical Bayesian rate. This was confirmed by plotting the EBest
> >>> output against GeoDa's rate and finding a perfect correlation along
> the 1
> >>> to 1 line. All good there.
> >>
> >> Please provide a reproducible example, as this may help with answers.
> >>
> >>>
> >>> Two questions:
> >>> 1. How can credible intervals around these smoothed rate estimates be
> >>> calculated?
> >>> 2. The spdep documentation calls this a binomial family, but the
> >> identical
> >>> results are obtained from GeoDa calls this "Poisson-Gamma" model here:
> >>> https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 ,
> so
> >>> what is actually being calculated? This question may help me answer the
> >>> first question..
> >>
> >> No, the default family is "poisson", with "binomial" available for
> >> non-rare conditions following Martuzzi, implemented by Olaf
> >> Berke, ?spdep::EBest.
> >>
> >> The code in spdep is easily accessible, so can be read directly. Please
> >> also compare with code for the EB Moran test, and with analogous code in
> >> the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section
> >> 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs.
> >> For code and data see
> https://asdar-book.org/bundles2ed/dismap_bundle.zip.
> >>
> >>>
> >>> Possibly the answers are addressed in the literature cited which I
> cannot
> >>> access right now at home without institutional library access.
> >>>
> >>
> >> Most institutions do have proxy or VPN access, but the code will be as
> >> useful. In PySAL, the code would also guide you, but even though GeoDa
> is
> >> open source, the C++ is fairly dense.
> >>
> >> Hope this helps,
> >>
> >> Roger
> >>
> >>> Thanks for your consideration,
> >>> Dexter
> >>> http://dexterlocke.com/
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> R-sig-Geo using r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >> https://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

	[[alternative HTML version deleted]]