[R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR

Roger Bivand Roger.Bivand at nhh.no
Tue Sep 3 21:54:01 CEST 2013


yOn Fri, 30 Aug 2013, Roger Bivand wrote:

> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>
>> Thank you. I'd like to subset into a specific county. Should there be
>> further partitioning from that level?
>> 
>
> No idea. Please re-create your scenario by subsetting georgia and the 
> coordinates to suit.
>

library(spgwr)
example(georgia)
gSRDF1 <- gSRDF[1:100,]
gSRDF2 <- gSRDF[101:159,]
bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
   PctBlack, data=gSRDF1, adapt=TRUE, method="aic")
model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
   PctBlack, data=gSRDF1, adapt=bwsel, hatmatrix=TRUE)
PredictionsOfNewData  <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + 
PctFB + PctPov + PctBlack, data=gSRDF1, fit.points=gSRDF2, adapt=bwsel,
   prediction=TRUE, fittedGWRobject=model1)
plot(gSRDF2$PctBach, PredictionsOfNewData$SDF$pred)

with the development version of spgwr on R-forge; with the released 
version the polygons of gSRDF2 cause an error. Note your confusion about 
adapt= in gwr(), if set as adapt=TRUE, this means adapt=1, so includes all 
the observations in the kernel, setting a very broad bandwidth. Never call 
gw.adapt(), it isn't a user-level function, but is exposed for exploring 
the inadequacies of GWR as a method. I would have appreciated an answer 
wrt. whether your held out test set is random or clustered, but here I've 
just subsetted the data in the simplest way.

Roger


> Roger
>
>> 
>> On Fri, Aug 30, 2013 at 10:19 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> 
>>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>>
>>>  Alrighty then!
>>>> 
>>> 
>>> Thanks. Now make this your case by subsetting georgia in a way that
>>> matches your case (all counties west of x?, random set?), and we may be
>>> getting closer. In the geographical partition, the fit points are all a
>>> long way from the data points, in the random case, they aren't grouped in
>>> the same way. You may also need to run the model twice, passing the fitted
>>> model (fit.points == data.points) through to the next stage, but I'm 
>>> unsure
>>> about that.
>>> 
>>> Roger
>>> 
>>> 
>>>> Say I create this adaptive bandwidth model using the original dataset
>>>> "georgia"
>>>> 
>>>> coords = cbind(georgia$x, georgia$y)
>>>> bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov 
>>>> +
>>>> PctBlack, data=georgia, adapt=TRUE, coords, gweight=gwr.Gauss, method =
>>>> "aic" )
>>>> bw1 <- gw.adapt(coords, coords, quant=bwsel)
>>>> model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
>>>> PctBlack, data=georgia, bw=b1, coords, hatmatrix=T)
>>>> model 1
>>>> 
>>>> Suppose I receive an updated data set (same dependent and independent
>>>> variables) and I wish to test the above model1's ability to predict the
>>>> dependent variable of these new data points. If this were a basic lm
>>>> regression in R, I would use the "predict()" command. I wish to better
>>>> understand how I would do so using a GWR model. I found the below
>>>> procedure, but I would like to know first if it is capable accomplishing
>>>> this task, and secondly, if I am specifying it correctly. It seems to me
>>>> that this procedure, as it stands, doesn't take into account the
>>>> appropriate bandwidths for the new data, say, "georgiaNewData"
>>>> 
>>>> PredictionsOfNewData  <- gwr(PctBach ~ TotPop90 + PctRural + PctEld +
>>>> PctFB
>>>> + PctPov + PctBlack, data=gSRDF, adapt=TRUE, gweight=gwr.Gauss, method =
>>>> "aic",  bandwidth=bw1,
>>>> predictions=TRUE, fit.points=georgiaNewData)
>>>> PredictionsOfNewData
>>>> 
>>>> Thanks in advance for guidance and insight...
>>>> 
>>>> 
>>>> On Fri, Aug 30, 2013 at 9:01 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>> wrote:
>>>>
>>>>  Provide a reproducible code example of your problem using a built in 
>>>> data
>>>>> set. No reproducible example, no response, as I cannot guess (and likely
>>>>> nobody else can either) what your specific misunderstanding is. Code
>>>>> using
>>>>> for example the Georgia data set in the package. You seem to be assuming
>>>>> that you understand how GWR works, I don't think that you do, so you 
>>>>> have
>>>>> to show what you mean in code.
>>>>> 
>>>>> Roger
>>>>> 
>>>>> 
>>>>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>>>>
>>>>>  Roger,
>>>>> 
>>>>>> 
>>>>>> I think all I would like to know is if it is possible to apply a
>>>>>> calibrated
>>>>>> GWR model to a hold-out sample, and if so, what the most accurate way 
>>>>>> to
>>>>>> do
>>>>>> so is. I understand the pitfalls of GWR but would like to learn as much
>>>>>> as
>>>>>> I can before progressing to the next spatial methodology I learn in R.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 30, 2013 at 3:37 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>> wrote:
>>>>>>
>>>>>>  Paul, Luis,
>>>>>> 
>>>>>>> 
>>>>>>> I suspect that your speculations are completely wrong-headed. Please
>>>>>>> provide a reproducible example with a built-in data set, so that there
>>>>>>> is
>>>>>>> at least minimal clarity in what you are guessing. Note in addition
>>>>>>> that
>>>>>>> GWR as a technique should not be used for anything other than
>>>>>>> exploration
>>>>>>> of possible mis-specification in the underlying model with the given
>>>>>>> data,
>>>>>>> as patterning in coefficients is induced by GWR for simulated
>>>>>>> covariates
>>>>>>> with no pattern.
>>>>>>> 
>>>>>>> Roger
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, 30 Aug 2013, Luis Guerra wrote:
>>>>>>>
>>>>>>>  Thank you Luis. When calibrating the adaptive model, using adapt=t in
>>>>>>> the
>>>>>>>
>>>>>>>  bandwidth selection created the proportion you speak of, which then
>>>>>>>> 
>>>>>>>>> allowed
>>>>>>>>> me to create a bandwidth matrix using gwr.adapt. However, this has
>>>>>>>>> not
>>>>>>>>> worked for me with holdout samples. Have you had success in this
>>>>>>>>> regard?
>>>>>>>>>
>>>>>>>>>  Now I get what you mean. Let's show an example:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> bw <- gwr.sel(var ~ var1, data=yourdata, adapt=TRUE)
>>>>>>>> m <- gwr(var~var1, data=yourdata, adapt=bw, fit.points=newdata)
>>>>>>>> 
>>>>>>>> So an adaptative bandwidth (bw) is calculated based on"yourdata",
>>>>>>>> while
>>>>>>>> you
>>>>>>>> are fitting "newdata" later on using that previously found bw. I had
>>>>>>>> not
>>>>>>>> thought about it previously. Let's see whether someone else can help
>>>>>>>> you
>>>>>>>> (us).
>>>>>>>> 
>>>>>>>>
>>>>>>>>  I do not know the intended influence of these "fit.points". I would
>>>>>>>> think
>>>>>>>>
>>>>>>>>  that new localized regressions are not calculated, as we're testing
>>>>>>>>> the
>>>>>>>>> model and previous data points' ability to predict for these new
>>>>>>>>> ones,
>>>>>>>>> but
>>>>>>>>> I could be wrong. My current method, however, is producing much
>>>>>>>>> poorer
>>>>>>>>> results with the holdouts, which I am fairly sure is related to my
>>>>>>>>> inability to incorporate the new points necessary bandwidths.
>>>>>>>>>
>>>>>>>>>  Coming back to the previously created example, imagine that
>>>>>>>>> "newdata"
>>>>>>>>>
>>>>>>>>>  is a
>>>>>>>> single point that you want to fit. Imagine now that "yourdata" is a
>>>>>>>> sample
>>>>>>>> with 1000 cases. Then you are getting 1000 models with 1000 different
>>>>>>>> intercepts and 1000 different beta values to adjust var1, rigth? 
>>>>>>>> Which
>>>>>>>> of
>>>>>>>> all these parameters do you use for fitting "newdata"? And something
>>>>>>>> else,
>>>>>>>> what would happen with "newdata" if it is enough far away from
>>>>>>>> "yourdata"
>>>>>>>> and we would be using a fixed bandwidth?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>
>>>>>>>>  On Aug 29, 2013 8:56 PM, "Luis Guerra" <luispelayo84 at gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>
>>>>>>>>>  Dear Paul,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> I am dealing with this kind of problems right now, and if I am not
>>>>>>>>>> wrong,
>>>>>>>>>> when you want to apply an adaptative bandwidth, you should
>>>>>>>>>> introduce a
>>>>>>>>>> value for the "adapt" parameter instead of for the "bandwidth"
>>>>>>>>>> parameter.
>>>>>>>>>> This value will be between 0 and 1 and indicates the proportion of
>>>>>>>>>> cases
>>>>>>>>>> around your regression point that should be included to estimate
>>>>>>>>>> each
>>>>>>>>>> local
>>>>>>>>>> model. So depending on the amount of points around each case, the
>>>>>>>>>> model
>>>>>>>>>> will use a different bandwidth for each point to be fitted.
>>>>>>>>>> 
>>>>>>>>>> Related to your question, do you know what is the influence of the
>>>>>>>>>> data
>>>>>>>>>> introduced in the "data" parameter to the data to be fitted
>>>>>>>>>> (introduced
>>>>>>>>>> in
>>>>>>>>>> the "fit.points" parameter)? I mean, you have to obtain new local
>>>>>>>>>> models
>>>>>>>>>> (one for each point to be fitted), so I do not understand whether
>>>>>>>>>> the
>>>>>>>>>> "data" parameter is used somehow...
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> 
>>>>>>>>>> Luis
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Aug 30, 2013 at 1:26 AM, Paul Bidanset <pbidanset at gmail.com
>>>>>>>>>>
>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>>
>>>>>>>>>>  Hi Folks,
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> I was curious if anyone has had experience applying an SPGWR model
>>>>>>>>>>> with
>>>>>>>>>>> an
>>>>>>>>>>> adaptive bandwidth matrix to a holdout or validation sample. I am
>>>>>>>>>>> using
>>>>>>>>>>> the
>>>>>>>>>>> "fit.points" command, which does not seem to allow for a new
>>>>>>>>>>> bandwidth
>>>>>>>>>>> calibrated around the holdout samples XY coordinates. Any 
>>>>>>>>>>> direction
>>>>>>>>>>> would
>>>>>>>>>>> be greatly appreciated.  I am also open to other viable methods.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> 
>>>>>>>>>>> Paul
>>>>>>>>>>>
>>>>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>>>> 
>>>>>>>>>>> ______________________________******_________________
>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-sig-geo<https://stat.ethz.ch/mailman/****listinfo/r-sig-geo>
>>>>>>>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>>>>>> 
>>>>>>>>>>> <https://**stat.ethz.ch/**mailman/listinfo/**r-sig-geo<http://stat.ethz.ch/mailman/listinfo/**r-sig-geo>
>>>>>>>>>>> <h**ttps://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>
>>>>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> ______________________________******_________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-sig-geo<https://stat.ethz.ch/mailman/****listinfo/r-sig-geo>
>>>>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>>> 
>>>>>>>> <https://**stat.ethz.ch/**mailman/listinfo/**r-sig-geo<http://stat.ethz.ch/mailman/listinfo/**r-sig-geo>
>>>>>>>> <h**ttps://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>
>>>>>>>>  --
>>>>>>>> 
>>>>>>> Roger Bivand
>>>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>
>>>>>>  --
>>>>> Roger Bivand
>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>> e-mail: Roger.Bivand at nhh.no
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> --
>>> Roger Bivand
>>> Department of Economics, NHH Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>> e-mail: Roger.Bivand at nhh.no
>>> 
>>> 
>> 
>> 
>> 
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list