[R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR

Roger Bivand Roger.Bivand at nhh.no
Fri Aug 30 16:19:12 CEST 2013


On Fri, 30 Aug 2013, Paul Bidanset wrote:

> Alrighty then!

Thanks. Now make this your case by subsetting georgia in a way that 
matches your case (all counties west of x?, random set?), and we may be 
getting closer. In the geographical partition, the fit points are all a 
long way from the data points, in the random case, they aren't grouped in 
the same way. You may also need to run the model twice, passing the fitted 
model (fit.points == data.points) through to the next stage, but I'm 
unsure about that.

Roger

>
> Say I create this adaptive bandwidth model using the original dataset
> "georgia"
>
> coords = cbind(georgia$x, georgia$y)
> bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
> PctBlack, data=georgia, adapt=TRUE, coords, gweight=gwr.Gauss, method =
> "aic" )
> bw1 <- gw.adapt(coords, coords, quant=bwsel)
> model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
> PctBlack, data=georgia, bw=b1, coords, hatmatrix=T)
> model 1
>
> Suppose I receive an updated data set (same dependent and independent
> variables) and I wish to test the above model1's ability to predict the
> dependent variable of these new data points. If this were a basic lm
> regression in R, I would use the "predict()" command. I wish to better
> understand how I would do so using a GWR model. I found the below
> procedure, but I would like to know first if it is capable accomplishing
> this task, and secondly, if I am specifying it correctly. It seems to me
> that this procedure, as it stands, doesn't take into account the
> appropriate bandwidths for the new data, say, "georgiaNewData"
>
> PredictionsOfNewData  <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB
> + PctPov + PctBlack, data=gSRDF, adapt=TRUE, gweight=gwr.Gauss, method =
> "aic",  bandwidth=bw1,
> predictions=TRUE, fit.points=georgiaNewData)
> PredictionsOfNewData
>
> Thanks in advance for guidance and insight...
>
>
> On Fri, Aug 30, 2013 at 9:01 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> Provide a reproducible code example of your problem using a built in data
>> set. No reproducible example, no response, as I cannot guess (and likely
>> nobody else can either) what your specific misunderstanding is. Code using
>> for example the Georgia data set in the package. You seem to be assuming
>> that you understand how GWR works, I don't think that you do, so you have
>> to show what you mean in code.
>>
>> Roger
>>
>>
>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>
>>  Roger,
>>>
>>> I think all I would like to know is if it is possible to apply a
>>> calibrated
>>> GWR model to a hold-out sample, and if so, what the most accurate way to
>>> do
>>> so is. I understand the pitfalls of GWR but would like to learn as much as
>>> I can before progressing to the next spatial methodology I learn in R.
>>>
>>>
>>> On Fri, Aug 30, 2013 at 3:37 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>> wrote:
>>>
>>>  Paul, Luis,
>>>>
>>>> I suspect that your speculations are completely wrong-headed. Please
>>>> provide a reproducible example with a built-in data set, so that there is
>>>> at least minimal clarity in what you are guessing. Note in addition that
>>>> GWR as a technique should not be used for anything other than exploration
>>>> of possible mis-specification in the underlying model with the given
>>>> data,
>>>> as patterning in coefficients is induced by GWR for simulated covariates
>>>> with no pattern.
>>>>
>>>> Roger
>>>>
>>>>
>>>> On Fri, 30 Aug 2013, Luis Guerra wrote:
>>>>
>>>>  Thank you Luis. When calibrating the adaptive model, using adapt=t in
>>>> the
>>>>
>>>>> bandwidth selection created the proportion you speak of, which then
>>>>>> allowed
>>>>>> me to create a bandwidth matrix using gwr.adapt. However, this has not
>>>>>> worked for me with holdout samples. Have you had success in this
>>>>>> regard?
>>>>>>
>>>>>>  Now I get what you mean. Let's show an example:
>>>>>>
>>>>>
>>>>> bw <- gwr.sel(var ~ var1, data=yourdata, adapt=TRUE)
>>>>> m <- gwr(var~var1, data=yourdata, adapt=bw, fit.points=newdata)
>>>>>
>>>>> So an adaptative bandwidth (bw) is calculated based on"yourdata", while
>>>>> you
>>>>> are fitting "newdata" later on using that previously found bw. I had not
>>>>> thought about it previously. Let's see whether someone else can help you
>>>>> (us).
>>>>>
>>>>>
>>>>>  I do not know the intended influence of these "fit.points". I would
>>>>> think
>>>>>
>>>>>> that new localized regressions are not calculated, as we're testing the
>>>>>> model and previous data points' ability to predict for these new ones,
>>>>>> but
>>>>>> I could be wrong. My current method, however, is producing much poorer
>>>>>> results with the holdouts, which I am fairly sure is related to my
>>>>>> inability to incorporate the new points necessary bandwidths.
>>>>>>
>>>>>>  Coming back to the previously created example, imagine that "newdata"
>>>>>>
>>>>> is a
>>>>> single point that you want to fit. Imagine now that "yourdata" is a
>>>>> sample
>>>>> with 1000 cases. Then you are getting 1000 models with 1000 different
>>>>> intercepts and 1000 different beta values to adjust var1, rigth? Which
>>>>> of
>>>>> all these parameters do you use for fitting "newdata"? And something
>>>>> else,
>>>>> what would happen with "newdata" if it is enough far away from
>>>>> "yourdata"
>>>>> and we would be using a fixed bandwidth?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  On Aug 29, 2013 8:56 PM, "Luis Guerra" <luispelayo84 at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>  Dear Paul,
>>>>>>
>>>>>>>
>>>>>>> I am dealing with this kind of problems right now, and if I am not
>>>>>>> wrong,
>>>>>>> when you want to apply an adaptative bandwidth, you should introduce a
>>>>>>> value for the "adapt" parameter instead of for the "bandwidth"
>>>>>>> parameter.
>>>>>>> This value will be between 0 and 1 and indicates the proportion of
>>>>>>> cases
>>>>>>> around your regression point that should be included to estimate each
>>>>>>> local
>>>>>>> model. So depending on the amount of points around each case, the
>>>>>>> model
>>>>>>> will use a different bandwidth for each point to be fitted.
>>>>>>>
>>>>>>> Related to your question, do you know what is the influence of the
>>>>>>> data
>>>>>>> introduced in the "data" parameter to the data to be fitted
>>>>>>> (introduced
>>>>>>> in
>>>>>>> the "fit.points" parameter)? I mean, you have to obtain new local
>>>>>>> models
>>>>>>> (one for each point to be fitted), so I do not understand whether the
>>>>>>> "data" parameter is used somehow...
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Luis
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 30, 2013 at 1:26 AM, Paul Bidanset <pbidanset at gmail.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>
>>>>>>>  Hi Folks,
>>>>>>>
>>>>>>>>
>>>>>>>> I was curious if anyone has had experience applying an SPGWR model
>>>>>>>> with
>>>>>>>> an
>>>>>>>> adaptive bandwidth matrix to a holdout or validation sample. I am
>>>>>>>> using
>>>>>>>> the
>>>>>>>> "fit.points" command, which does not seem to allow for a new
>>>>>>>> bandwidth
>>>>>>>> calibrated around the holdout samples XY coordinates. Any direction
>>>>>>>> would
>>>>>>>> be greatly appreciated.  I am also open to other viable methods.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Paul
>>>>>>>>
>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________****_________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>> <https://**stat.ethz.ch/mailman/listinfo/**r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________****_________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>> <https://**stat.ethz.ch/mailman/listinfo/**r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>> Roger Bivand
>>>> Department of Economics, NHH Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>> e-mail: Roger.Bivand at nhh.no
>>>>
>>>>
>>>>
>>>
>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, NHH Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list