[R-sig-Geo] How to efficiently generate data of neighboring points

Fri Jun 5 12:28:18 CEST 2020

Thank you once again. To clarify, which is more suitable, end of year water
levels or yearly average measure of water levels?

Also below are a few more notes to throw more light on my variables/data:

These wells are solely for irrigation purposes and are
irrigator/farmer-owned and operated.
No farmer/irrigator moves to another well not owned by him. The only reason
to suspect any spatial externalities is because the wells share a common
aquifer.
And this is essentially what I am testing.

It is also understood that there are not much variation in the geography
and geology of the study region.

I have data a number of well specific features in addition to the water
level. I also have some farm data including cropping and technology use
data. No soil data though.
No recharge data too as well.

In fact, I agree a lot factors can come to play here and I may not have or
observe all but I was thinking I could incorporate some fixed effects
to take care of those, especially for those I suspect (or perhaps by
theory) are likely to not vary much in terms of their effect on
irrigation(pumping) decisions across farmers
or effect on water level.

My panel is rather a short one: I have a five year panel data.

Given the above, is it still not advisable to use any spatial econometric
analysis? Just a simple OLS will suffice?

Thanks.
----------------------
Lom

On Fri, Jun 5, 2020 at 3:51 AM Roger Bivand <Roger.Bivand using nhh.no> wrote:

> On Fri, 5 Jun 2020, Lom Navanyo wrote:
>
> > I fully agree with you and appreciate the listed benefits of not taking
> > things private. I was just trying to be sure the forum here is
> appropriate
> > and receptive of a beginner like me.
> >
> > To be more explicit with regards to my observations, y is amount of
> > water withdrawal from wells and an important variable in x is (height
> > of) water level in the wells. These are end of year figures. I am using
> > the aggregations (sum for y and mean for water level) by band as spatial
> > neighborhood variables. There will be one or two indicator variables
> > also in x. I hope these do not present additional hurdles.
>
> There are several further questions. If water level is measured at
> end-of-year, it is instantaneous at that point, and will depend on level a
> year earlier plus inflow from the movements of the water table
> (precipitation, soils and surface geology, maybe geology if deeper wells),
> minus evaporation (if an open well) and extraction. However, your y
> (extraction) is probably measured over an interval (1 Jan - 31 Dec?). It
> does not depend on level unless level is 0, but depends on the closeness
> of people extracting water for domestic, agricultural or other use.
>
> All else equal, you would expect changes in the level in a well to depend
> on inputs, evaporation and extraction, and extraction at that well and
> other nearby wells (which may experience falls in the ground water table
> level not because the water was extracted from those wells, but at
> neighbouring wells. You may also see users shifting to neerby wells if
> their closest well runs dry.
>
> So you probably need to start with a deterministic hydrological model, and
> you need much more information about who extracts and why. Say in India,
> you would also need price data - apparently free water has led to
> over-extraction.
>
> So I would advise against any spatial econometric analysis of the data you
> have, because so much is going on in the system as a whole that you cannot
> control if all the data you have is as you describe. I also understand
> better why well water level is endogeneous, but am sure that IV will not
> help, since the level is being driven partly by a deterministic
> hydrological system which differs from well to well, and extraction varies
> by demand.
>
> Has anyone worked with this kind of data? Any ideas or contributions more
> helpful than the above?
>
> Roger
>
> >
> > I am thinking Proximity is relevant in testing spatial
> > dependency/externality.
> >
> > I will consider splm package  and the SLX model.
> >
> > Thank you.
> > ---------------
> > Lom
> >
> > On Thu, Jun 4, 2020 at 2:52 PM Roger Bivand <Roger.Bivand using nhh.no> wrote:
> >
> >> On Thu, 4 Jun 2020, Lom Navanyo wrote:
> >>
> >>> Thank you. Yes, the OLS is biased and my plan is to use a 2SLS
> approach.
> >> I
> >>> have a variable I intend to use as an IV for y.
> >>> I have seen a few papers use this approach. Will this approach not
> >> correct
> >>> for the endogeneity?
> >>>
> >>> Actually, I am not sure if this is a right forum or perhaps if it's
> >>> appropriate or acceptable to you to take this one-on-one with you for
> >> help:
> >>
> >> I do not offer private help. That would presuppose that one person has
> the
> >> answer. It would also presuppose that all exchanges are only read by the
> >> original poster and direct participants, while in fact others may join
> in,
> >> or follow a thread, or find the thread by searching: google supports the
> >> list:r-sig-geo search tag. If the thread goes private, that search is
> >> fruitless.
> >>
> >>> My model actually looks like this: y= f(y, x)  + e.
> >>> Aside the endogeneity of y (which I intend to instrument by another
> >>> variable z), there is simultaneity between y and x.
> >>> I intend to use the lag of x as instrument for x.  Given that I am
> >> seeking
> >>> to test spatial dependency, do you see some fatal flaws with my
> approach?
> >>>
> >>
> >> What is the support of your observations, point, or are they
> aggregations?
> >> Why may proximity make a difference - often, apparent spatial
> >> autocorrelation is caused by observing inappropriate entities, or by
> >> omitting covariates, or by using the wrong functional form.
> >>
> >>
> >>> I have also seen other empirical approaches like static and dynamic
> >> spatial
> >>> panel data modelling. I will be reviewing them also to see suitability
> >> for
> >>> my objective.
> >>> But, any further directions or suggestions are highly appreciated.
> >>
> >> If the data are spatial panel, you can look at the splm package.
> >> Personally, I have never found instruments any use at all, because the
> >> instruments are typically at best weak because of shared spatial
> processes
> >> with the response, unless the model is really well specified from known
> >> theory. In space, almost everything is close to endogeneous unless the
> >> opposite is demonstrated. So causal relationships are less worthwhile,
> >> because they are at best conditional on omitted variables and
> >> autocorrelation engendered by the choice of observational entities.
> >>
> >> Further, because spatial processes are driven by the inverse matrix of
> the
> >> input graph of proximate neighbours (the covariance matrix of the
> spatial
> >> process), you don't need to start from more than the first order
> >> neighbours. Maybe your x has the same spatial pattern as y, so that the
> >> residuals are white noise with no spatial structure.
> >>
> >> Recently, analysts prefer to start with the SLX model (Halleck Vega &
> >> Elhorst 2015 and others), so that might be worth exploring. If only the
> >> direct impacts seem important, OLS may be enough.
> >>
> >> Hope this helps,
> >>
> >> Roger
> >>
> >>>
> >>> Thanks,
> >>> -------------------
> >>> Lom
> >>>
> >>>
> >>>
> >>> On Thu, Jun 4, 2020 at 3:48 AM Roger Bivand <Roger.Bivand using nhh.no>
> wrote:
> >>>
> >>>> On Thu, 4 Jun 2020, Lom Navanyo wrote:
> >>>>
> >>>>> Thank you very much for your support. This gives me what I need and I
> >>>> must
> >>>>> say listw2sn() is really great.
> >>>>>
> >>>>> Why do I need the data in the format as in dataout? I am trying to
> test
> >>>>> spatial dependence (or neighborhood effect) by running a regression
> >>>>> model that entails pop_size_it = beta_1*sum of pop_size of point i's
> >>>>> neighbors within a specified radius. So my plan is to get the
> neighbors
> >>>>> for each focal point as per the specified bands and their attributes
> >> (eg
> >>>>> pop_size) so I can can add them (attribute) by the bands.
> >>>>
> >>>> Thanks, clarifies a good deal. Maybe look at the original localG
> >> articles
> >>>> for exploring distance relationships (Getis and Ord looked at
> HIV/AIDS);
> >>>> ?spdep::localG or
> >> https://r-spatial.github.io/spdep/reference/localG.html.
> >>>>
> >>>> Further note at OLS is biased as you have y = f(y) + e, so y on both
> >>>> sides. The nearest equivalent for a single band is
> >> spatialreg::lagsarlm()
> >>>> with listw=nb2listw(wd1, style="B") to get the neighbour sums through
> >> the
> >>>> weights matrix. So both your betas and their standard errors are
> >> unusable,
> >>>> I'm afraid. You are actually very much closer to ordinary kriging,
> >> looking
> >>>> at the way in which distance attenuates the correlation in value of
> >>>> proximate observations.
> >>>>
> >>>> Hope this clarifies,
> >>>>
> >>>> Roger
> >>>>
> >>>>>
> >>>>> I am totally new to the area of spatial econometrics, so I am taking
> >>>> things
> >>>>> one step at a time. Some readings suggest I may need distance matrix
> or
> >>>>> weight matrix but for now I think I should try the current approach.
> >>>>>
> >>>>> Thank you.
> >>>>>
> >>>>> -------------
> >>>>> Lom
> >>>>>
> >>>>> On Wed, Jun 3, 2020 at 8:18 AM Roger Bivand <Roger.Bivand using nhh.no>
> >> wrote:
> >>>>>
> >>>>>> On Wed, 3 Jun 2020, Lom Navanyo wrote:
> >>>>>>
> >>>>>>> I had the errors with rtree using R 3.6.3. I have since changed to
> R
> >>>>>> 4.0.0
> >>>>>>> but I got the same error.
> >>>>>>>
> >>>>>>> And  yes, for Roger's example, I have the objects wd1, ... wd4, all
> >>>> with
> >>>>>>> length 101. I think my difficulty is my inability to output the
> list
> >>>>>>> detailing the point IDs t50_fid.
> >>>>>>
> >>>>>> library(spData)
> >>>>>> library(sf)
> >>>>>> projdata<-st_transform(nz_height, 32759)
> >>>>>> pts <- st_coordinates(projdata)
> >>>>>> library(spdep)
> >>>>>> bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
> >>>>>> bds <- c(0, bufferR)
> >>>>>> wd1 <- dnearneigh(pts, bds[1], bds[2])
> >>>>>> wd2 <- dnearneigh(pts, bds[2], bds[3])
> >>>>>> wd3 <- dnearneigh(pts, bds[3], bds[4])
> >>>>>> wd4 <- dnearneigh(pts, bds[4], bds[5])
> >>>>>> sn_band1 <- listw2sn(nb2listw(wd1, style="B", zero.policy=TRUE))
> >>>>>> sn_band1$band <- paste(attr(wd1, "distances"), collapse="-")
> >>>>>> sn_band2 <- listw2sn(nb2listw(wd2, style="B", zero.policy=TRUE))
> >>>>>> sn_band2$band <- paste(attr(wd2, "distances"), collapse="-")
> >>>>>> sn_band3 <- listw2sn(nb2listw(wd3, style="B", zero.policy=TRUE))
> >>>>>> sn_band3$band <- paste(attr(wd3, "distances"), collapse="-")
> >>>>>> sn_band4 <- listw2sn(nb2listw(wd4, style="B", zero.policy=TRUE))
> >>>>>> sn_band4$band <- paste(attr(wd4, "distances"), collapse="-")
> >>>>>> data_out <- do.call("rbind", list(sn_band1, sn_band2, sn_band3,
> >>>> sn_band4))
> >>>>>> class(data_out) <- "data.frame"
> >>>>>> table(data_out$band)
> >>>>>> data_out$ID_from <- projdata$t50_fid[data_out$from]
> >>>>>> data_out$ID_to <- projdata$t50_fid[data_out$to]
> >>>>>> data_out$elev_from <- projdata$elevation[data_out$from]
> >>>>>> data_out$elev_to <- projdata$elevation[data_out$to]
> >>>>>> str(data_out)
> >>>>>>
> >>>>>> The "spatial.neighbour" representation was that used in the S-Plus
> >>>>>> SpatialStats module, with "from" and "to" columns, and here drops
> >>>>>> no-neighbour cases gracefully. So listw2sn() comes in useful
> >>>>>> for creating the output, and from there, just look-up in the
> >>>>>> input data.frame. Observations here cannot be their own neighbours.
> >>>>>>
> >>>>>> It would be relevant to know why you need these, are you looking at
> >>>>>> variogram clouds?
> >>>>>>
> >>>>>> Hope this clarifies,
> >>>>>>
> >>>>>> Roger
> >>>>>>
> >>>>>>>
> >>>>>>> ---------
> >>>>>>> Lom
> >>>>>>>
> >>>>>>> On Tue, Jun 2, 2020 at 8:02 PM Kent Johnson <kent3737 using gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Roger's example works for me and gives a list of length 101. I did
> >>>> have
> >>>>>>>> some issues that were resolved by updating packages. I'm using R
> >> 3.6.3
> >>>>>> on
> >>>>>>>> macOS 10.15.4. I also use rtree successfully on Windows 10 with R
> >>>> 3.6.3.
> >>>>>>>>
> >>>>>>>> Kent
> >>>>>>>>
> >>>>>>>> On Tue, Jun 2, 2020 at 12:29 PM Roger Bivand <Roger.Bivand using nhh.no
> >
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> On Tue, 2 Jun 2020, Kent Johnson wrote:
> >>>>>>>>>
> >>>>>>>>>> rtree uses Euclidean distance so the points should be in a
> >>>> coordinate
> >>>>>>>>>> system where this makes sense at least as a reasonable
> >>>> approximation.
> >>>>>>>>>
> >>>>>>>>> I tried the original example:
> >>>>>>>>>
> >>>>>>>>> remotes::install_github("hunzikp/rtree")
> >>>>>>>>> library(spData)
> >>>>>>>>> library(sf)
> >>>>>>>>> projdata<-st_transform(nz_height, 32759)
> >>>>>>>>> library(rtree)
> >>>>>>>>> pts <- st_coordinates(projdata)
> >>>>>>>>> rt <- RTree(st_coordinates(projdata))
> >>>>>>>>> bufferR <- c(402.336, 1609.34, 3218.69, 4828.03, 6437.38)
> >>>>>>>>> wd1 <- withinDistance(rt, pts, bufferR[1])
> >>>>>>>>>
> >>>>>>>>> but unfortunately failed (maybe newer Boost headers than yours?):
> >>>>>>>>>
> >>>>>>>>> Error in UseMethod("withinDistance", rTree) :
> >>>>>>>>>    no applicable method for 'withinDistance' applied to an object
> >> of
> >>>>>>>>> class
> >>>>>>>>> "c('list', 'RTree')"
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Kent
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Jun 2, 2020 at 9:59 AM Roger Bivand <
> Roger.Bivand using nhh.no>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On Tue, 2 Jun 2020, Kent Johnson wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>> Date: Tue, 2 Jun 2020 02:44:17 -0500
> >>>>>>>>>>>>> From: Lom Navanyo <lomnavasia using gmail.com>
> >>>>>>>>>>>>> To: r-sig-geo using r-project.org
> >>>>>>>>>>>>> Subject: [R-sig-Geo] How to efficiently generate data of
> >>>>>> neighboring
> >>>>>>>>>>>>>         points within specified radii (distances) for each
> >> point
> >>>>>> in a
> >>>>>>>>>>> given
> >>>>>>>>>>>>>         points data set.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>> I have data set of about 3400 location points with which I am
> >>>>>> trying
> >>>>>>>>> to
> >>>>>>>>>>>>> generate data of each point and their neighbors within
> defined
> >>>>>> radii
> >>>>>>>>>>> (eg,
> >>>>>>>>>>>>> 0.25, 1, and 3 miles).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> The rtree package is very fast and memory-efficient for
> >>>>>>>>> within-distance
> >>>>>>>>>>>> calculations.
> >>>>>>>>>>>> https://github.com/hunzikp/rtree
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks! Does this also apply when the input points are in
> >>>>>> geographical
> >>>>>>>>>>> coordinates?
> >>>>>>>>>>>
> >>>>>>>>>>> Roger
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Kent Johnson
> >>>>>>>>>>>> Cambridge, MA
> >>>>>>>>>>>>
> >>>>>>>>>>>>       [[alternative HTML version deleted]]
> >>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> R-sig-Geo mailing list
> >>>>>>>>>>>> R-sig-Geo using r-project.org
> >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Roger Bivand
> >>>>>>>>>>> Department of Economics, Norwegian School of Economics,
> >>>>>>>>>>> Helleveien 30, N-5045 Bergen, Norway.
> >>>>>>>>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >>>>>>>>>>> https://orcid.org/0000-0003-2392-6140
> >>>>>>>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Roger Bivand
> >>>>>>>>> Department of Economics, Norwegian School of Economics,
> >>>>>>>>> Helleveien 30, N-5045 Bergen, Norway.
> >>>>>>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >>>>>>>>> https://orcid.org/0000-0003-2392-6140
> >>>>>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Roger Bivand
> >>>>>> Department of Economics, Norwegian School of Economics,
> >>>>>> Helleveien 30, N-5045 Bergen, Norway.
> >>>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >>>>>> https://orcid.org/0000-0003-2392-6140
> >>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Roger Bivand
> >>>> Department of Economics, Norwegian School of Economics,
> >>>> Helleveien 30, N-5045 Bergen, Norway.
> >>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >>>> https://orcid.org/0000-0003-2392-6140
> >>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>>
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >> https://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

	[[alternative HTML version deleted]]