[R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR

Roger Bivand Roger.Bivand at nhh.no
Fri Aug 30 16:40:05 CEST 2013


On Fri, 30 Aug 2013, Paul Bidanset wrote:

> Thank you. I'd like to subset into a specific county. Should there be
> further partitioning from that level?
>

No idea. Please re-create your scenario by subsetting georgia and the 
coordinates to suit.

Roger

>
> On Fri, Aug 30, 2013 at 10:19 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>
>>  Alrighty then!
>>>
>>
>> Thanks. Now make this your case by subsetting georgia in a way that
>> matches your case (all counties west of x?, random set?), and we may be
>> getting closer. In the geographical partition, the fit points are all a
>> long way from the data points, in the random case, they aren't grouped in
>> the same way. You may also need to run the model twice, passing the fitted
>> model (fit.points == data.points) through to the next stage, but I'm unsure
>> about that.
>>
>> Roger
>>
>>
>>> Say I create this adaptive bandwidth model using the original dataset
>>> "georgia"
>>>
>>> coords = cbind(georgia$x, georgia$y)
>>> bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
>>> PctBlack, data=georgia, adapt=TRUE, coords, gweight=gwr.Gauss, method =
>>> "aic" )
>>> bw1 <- gw.adapt(coords, coords, quant=bwsel)
>>> model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
>>> PctBlack, data=georgia, bw=b1, coords, hatmatrix=T)
>>> model 1
>>>
>>> Suppose I receive an updated data set (same dependent and independent
>>> variables) and I wish to test the above model1's ability to predict the
>>> dependent variable of these new data points. If this were a basic lm
>>> regression in R, I would use the "predict()" command. I wish to better
>>> understand how I would do so using a GWR model. I found the below
>>> procedure, but I would like to know first if it is capable accomplishing
>>> this task, and secondly, if I am specifying it correctly. It seems to me
>>> that this procedure, as it stands, doesn't take into account the
>>> appropriate bandwidths for the new data, say, "georgiaNewData"
>>>
>>> PredictionsOfNewData  <- gwr(PctBach ~ TotPop90 + PctRural + PctEld +
>>> PctFB
>>> + PctPov + PctBlack, data=gSRDF, adapt=TRUE, gweight=gwr.Gauss, method =
>>> "aic",  bandwidth=bw1,
>>> predictions=TRUE, fit.points=georgiaNewData)
>>> PredictionsOfNewData
>>>
>>> Thanks in advance for guidance and insight...
>>>
>>>
>>> On Fri, Aug 30, 2013 at 9:01 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>> wrote:
>>>
>>>  Provide a reproducible code example of your problem using a built in data
>>>> set. No reproducible example, no response, as I cannot guess (and likely
>>>> nobody else can either) what your specific misunderstanding is. Code
>>>> using
>>>> for example the Georgia data set in the package. You seem to be assuming
>>>> that you understand how GWR works, I don't think that you do, so you have
>>>> to show what you mean in code.
>>>>
>>>> Roger
>>>>
>>>>
>>>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>>>
>>>>  Roger,
>>>>
>>>>>
>>>>> I think all I would like to know is if it is possible to apply a
>>>>> calibrated
>>>>> GWR model to a hold-out sample, and if so, what the most accurate way to
>>>>> do
>>>>> so is. I understand the pitfalls of GWR but would like to learn as much
>>>>> as
>>>>> I can before progressing to the next spatial methodology I learn in R.
>>>>>
>>>>>
>>>>> On Fri, Aug 30, 2013 at 3:37 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>> wrote:
>>>>>
>>>>>  Paul, Luis,
>>>>>
>>>>>>
>>>>>> I suspect that your speculations are completely wrong-headed. Please
>>>>>> provide a reproducible example with a built-in data set, so that there
>>>>>> is
>>>>>> at least minimal clarity in what you are guessing. Note in addition
>>>>>> that
>>>>>> GWR as a technique should not be used for anything other than
>>>>>> exploration
>>>>>> of possible mis-specification in the underlying model with the given
>>>>>> data,
>>>>>> as patterning in coefficients is induced by GWR for simulated
>>>>>> covariates
>>>>>> with no pattern.
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>
>>>>>> On Fri, 30 Aug 2013, Luis Guerra wrote:
>>>>>>
>>>>>>  Thank you Luis. When calibrating the adaptive model, using adapt=t in
>>>>>> the
>>>>>>
>>>>>>  bandwidth selection created the proportion you speak of, which then
>>>>>>>
>>>>>>>> allowed
>>>>>>>> me to create a bandwidth matrix using gwr.adapt. However, this has
>>>>>>>> not
>>>>>>>> worked for me with holdout samples. Have you had success in this
>>>>>>>> regard?
>>>>>>>>
>>>>>>>>  Now I get what you mean. Let's show an example:
>>>>>>>>
>>>>>>>>
>>>>>>> bw <- gwr.sel(var ~ var1, data=yourdata, adapt=TRUE)
>>>>>>> m <- gwr(var~var1, data=yourdata, adapt=bw, fit.points=newdata)
>>>>>>>
>>>>>>> So an adaptative bandwidth (bw) is calculated based on"yourdata",
>>>>>>> while
>>>>>>> you
>>>>>>> are fitting "newdata" later on using that previously found bw. I had
>>>>>>> not
>>>>>>> thought about it previously. Let's see whether someone else can help
>>>>>>> you
>>>>>>> (us).
>>>>>>>
>>>>>>>
>>>>>>>  I do not know the intended influence of these "fit.points". I would
>>>>>>> think
>>>>>>>
>>>>>>>  that new localized regressions are not calculated, as we're testing
>>>>>>>> the
>>>>>>>> model and previous data points' ability to predict for these new
>>>>>>>> ones,
>>>>>>>> but
>>>>>>>> I could be wrong. My current method, however, is producing much
>>>>>>>> poorer
>>>>>>>> results with the holdouts, which I am fairly sure is related to my
>>>>>>>> inability to incorporate the new points necessary bandwidths.
>>>>>>>>
>>>>>>>>  Coming back to the previously created example, imagine that
>>>>>>>> "newdata"
>>>>>>>>
>>>>>>>>  is a
>>>>>>> single point that you want to fit. Imagine now that "yourdata" is a
>>>>>>> sample
>>>>>>> with 1000 cases. Then you are getting 1000 models with 1000 different
>>>>>>> intercepts and 1000 different beta values to adjust var1, rigth? Which
>>>>>>> of
>>>>>>> all these parameters do you use for fitting "newdata"? And something
>>>>>>> else,
>>>>>>> what would happen with "newdata" if it is enough far away from
>>>>>>> "yourdata"
>>>>>>> and we would be using a fixed bandwidth?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On Aug 29, 2013 8:56 PM, "Luis Guerra" <luispelayo84 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>  Dear Paul,
>>>>>>>>
>>>>>>>>
>>>>>>>>> I am dealing with this kind of problems right now, and if I am not
>>>>>>>>> wrong,
>>>>>>>>> when you want to apply an adaptative bandwidth, you should
>>>>>>>>> introduce a
>>>>>>>>> value for the "adapt" parameter instead of for the "bandwidth"
>>>>>>>>> parameter.
>>>>>>>>> This value will be between 0 and 1 and indicates the proportion of
>>>>>>>>> cases
>>>>>>>>> around your regression point that should be included to estimate
>>>>>>>>> each
>>>>>>>>> local
>>>>>>>>> model. So depending on the amount of points around each case, the
>>>>>>>>> model
>>>>>>>>> will use a different bandwidth for each point to be fitted.
>>>>>>>>>
>>>>>>>>> Related to your question, do you know what is the influence of the
>>>>>>>>> data
>>>>>>>>> introduced in the "data" parameter to the data to be fitted
>>>>>>>>> (introduced
>>>>>>>>> in
>>>>>>>>> the "fit.points" parameter)? I mean, you have to obtain new local
>>>>>>>>> models
>>>>>>>>> (one for each point to be fitted), so I do not understand whether
>>>>>>>>> the
>>>>>>>>> "data" parameter is used somehow...
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Luis
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 30, 2013 at 1:26 AM, Paul Bidanset <pbidanset at gmail.com
>>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>  Hi Folks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I was curious if anyone has had experience applying an SPGWR model
>>>>>>>>>> with
>>>>>>>>>> an
>>>>>>>>>> adaptive bandwidth matrix to a holdout or validation sample. I am
>>>>>>>>>> using
>>>>>>>>>> the
>>>>>>>>>> "fit.points" command, which does not seem to allow for a new
>>>>>>>>>> bandwidth
>>>>>>>>>> calibrated around the holdout samples XY coordinates. Any direction
>>>>>>>>>> would
>>>>>>>>>> be greatly appreciated.  I am also open to other viable methods.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Paul
>>>>>>>>>>
>>>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> ______________________________******_________________
>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-sig-geo<https://stat.ethz.ch/mailman/****listinfo/r-sig-geo>
>>>>>>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>>>>>
>>>>>>>>>> <https://**stat.ethz.ch/**mailman/listinfo/**r-sig-geo<http://stat.ethz.ch/mailman/listinfo/**r-sig-geo>
>>>>>>>>>> <h**ttps://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>
>>>>>>> ______________________________******_________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> R-sig-Geo at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-sig-geo<https://stat.ethz.ch/mailman/****listinfo/r-sig-geo>
>>>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>>
>>>>>>> <https://**stat.ethz.ch/**mailman/listinfo/**r-sig-geo<http://stat.ethz.ch/mailman/listinfo/**r-sig-geo>
>>>>>>> <h**ttps://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>> Roger Bivand
>>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>> Roger Bivand
>>>> Department of Economics, NHH Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>> e-mail: Roger.Bivand at nhh.no
>>>>
>>>>
>>>>
>>>
>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, NHH Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list