[R-sig-Geo] Modeling areal data with lots of holes, islands

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Aug 10 10:03:54 CEST 2015


Dear Tim,

I assume that you know the location of all counties. Then you could make a
graph of the counties based on the neighbours of each county. Use that
graph in INLA to estimate the random effect of county. It can handle the
counties without data. Let's say that you have a simple graph: A - B - C -
D - E. The correlation between two neighbours in rho. If C has no data,
then INLA will estimate it's effect so that the correlation between B-C and
C-D is rho. As a result, the correlation between B-D will be rho ^ 2.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-07-22 21:10 GMT+02:00 Tim Meehan <tmeeha at gmail.com>:

> Hi All,
>
> I am using R to model data with repeated measures and strong spatial
> autocorrelation in OLS residuals.  Response and independent variables are
> summarized at the county level (2300 counties across contiguous US, each
> measured once during each of four years).
>
> There are lots of challenges with this data. First, the response is a
> proportion, so I am looking at modeling methods that can accommodate a
> beta-distribution (otherwise I'll logit-transform the response, or ignore
> the issue since residual distributions don't look too bad).  Second, the 4
> repeated measures per county sort of calls for either the ability to have
> random intercepts or the ability to incorporate a temporal correlation
> structure.  Third, the strong spatial patterning of residuals should
> probably be dealt with some how.  The modeling tools I am considering
> include mgcv::gam, mgcv::gamm, gamm4::gamm4, INLA::inla, spTimer::spTGibbs,
> etc.
>
> In order to choose the right tool, I need to decide on a way to specify the
> spatial relationships.  This is areal data, so it seems most common and
> correct to describe spatial relationships with graphs and employ
> neighborhood-oriented modeling approaches (e.g., CAR models).  But this
> data set is unusual in that I don't have data for all counties in the
> contiguous US (roughly 2300 out of 3100).  There are a lot of holes and
> islands in the map.  Conceptually, assigning neighbors seems odd when there
> are big gaps between counties, especially when I suspect that the spatial
> pattern in the data is due to continuous spatial processes.
>
> I would prefer to model these data using a geostatistical approach, using
> county centroids.  I know this is not ideal.  I have tried both approaches
> and they both yield similar fixed effect estimates for independent
> variables of interest. But the geostatistical approach produce better
> fitting models, eliminating nearly all residual autocorrelation.
>
> So, finally, the question.  Is it reasonable to model these areal data
> using a geostatistical approach given (1) there are lots of holes and
> islands in the areal data, (2) I suspect the spatial patterning to be due
> to continuous spatial processes, not adjacency of administrative
> boundaries, (3) there are roughly 2300 counties in the analysis, where
> variation in county size and shape is small compared to the continental
> analysis extent, (4) a geostatistical model does a better job of removing
> residual autocorrelation, and (5) I better understand the geostatistical
> ways of specifying spatial relationships?  What are the practical
> consequences of using geostatistical methods for areal data?  Are they
> greater than the consequences of using odd neighborhood specifications?
>
> Thanks for any advice you can offer.
>
> Best,
>
> Tim
>
> P.S. Sorry if this is the wrong venue for this question.  Please let me
> know if there is a better place to send it.
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list