[R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR

Roger Bivand Roger.Bivand at nhh.no
Fri Aug 30 15:01:50 CEST 2013


Provide a reproducible code example of your problem using a built in data 
set. No reproducible example, no response, as I cannot guess (and likely 
nobody else can either) what your specific misunderstanding is. Code using 
for example the Georgia data set in the package. You seem to be assuming 
that you understand how GWR works, I don't think that you do, so you have 
to show what you mean in code.

Roger

On Fri, 30 Aug 2013, Paul Bidanset wrote:

> Roger,
>
> I think all I would like to know is if it is possible to apply a calibrated
> GWR model to a hold-out sample, and if so, what the most accurate way to do
> so is. I understand the pitfalls of GWR but would like to learn as much as
> I can before progressing to the next spatial methodology I learn in R.
>
>
> On Fri, Aug 30, 2013 at 3:37 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> Paul, Luis,
>>
>> I suspect that your speculations are completely wrong-headed. Please
>> provide a reproducible example with a built-in data set, so that there is
>> at least minimal clarity in what you are guessing. Note in addition that
>> GWR as a technique should not be used for anything other than exploration
>> of possible mis-specification in the underlying model with the given data,
>> as patterning in coefficients is induced by GWR for simulated covariates
>> with no pattern.
>>
>> Roger
>>
>>
>> On Fri, 30 Aug 2013, Luis Guerra wrote:
>>
>>  Thank you Luis. When calibrating the adaptive model, using adapt=t in the
>>>> bandwidth selection created the proportion you speak of, which then
>>>> allowed
>>>> me to create a bandwidth matrix using gwr.adapt. However, this has not
>>>> worked for me with holdout samples. Have you had success in this regard?
>>>>
>>>>  Now I get what you mean. Let's show an example:
>>>
>>> bw <- gwr.sel(var ~ var1, data=yourdata, adapt=TRUE)
>>> m <- gwr(var~var1, data=yourdata, adapt=bw, fit.points=newdata)
>>>
>>> So an adaptative bandwidth (bw) is calculated based on"yourdata", while
>>> you
>>> are fitting "newdata" later on using that previously found bw. I had not
>>> thought about it previously. Let's see whether someone else can help you
>>> (us).
>>>
>>>
>>>  I do not know the intended influence of these "fit.points". I would think
>>>> that new localized regressions are not calculated, as we're testing the
>>>> model and previous data points' ability to predict for these new ones,
>>>> but
>>>> I could be wrong. My current method, however, is producing much poorer
>>>> results with the holdouts, which I am fairly sure is related to my
>>>> inability to incorporate the new points necessary bandwidths.
>>>>
>>>>  Coming back to the previously created example, imagine that "newdata"
>>> is a
>>> single point that you want to fit. Imagine now that "yourdata" is a sample
>>> with 1000 cases. Then you are getting 1000 models with 1000 different
>>> intercepts and 1000 different beta values to adjust var1, rigth? Which of
>>> all these parameters do you use for fitting "newdata"? And something else,
>>> what would happen with "newdata" if it is enough far away from "yourdata"
>>> and we would be using a fixed bandwidth?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>  On Aug 29, 2013 8:56 PM, "Luis Guerra" <luispelayo84 at gmail.com> wrote:
>>>>
>>>>  Dear Paul,
>>>>>
>>>>> I am dealing with this kind of problems right now, and if I am not
>>>>> wrong,
>>>>> when you want to apply an adaptative bandwidth, you should introduce a
>>>>> value for the "adapt" parameter instead of for the "bandwidth"
>>>>> parameter.
>>>>> This value will be between 0 and 1 and indicates the proportion of cases
>>>>> around your regression point that should be included to estimate each
>>>>> local
>>>>> model. So depending on the amount of points around each case, the model
>>>>> will use a different bandwidth for each point to be fitted.
>>>>>
>>>>> Related to your question, do you know what is the influence of the data
>>>>> introduced in the "data" parameter to the data to be fitted (introduced
>>>>> in
>>>>> the "fit.points" parameter)? I mean, you have to obtain new local models
>>>>> (one for each point to be fitted), so I do not understand whether the
>>>>> "data" parameter is used somehow...
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Luis
>>>>>
>>>>>
>>>>> On Fri, Aug 30, 2013 at 1:26 AM, Paul Bidanset <pbidanset at gmail.com
>>>>>> wrote:
>>>>>
>>>>>  Hi Folks,
>>>>>>
>>>>>> I was curious if anyone has had experience applying an SPGWR model with
>>>>>> an
>>>>>> adaptive bandwidth matrix to a holdout or validation sample. I am using
>>>>>> the
>>>>>> "fit.points" command, which does not seem to allow for a new bandwidth
>>>>>> calibrated around the holdout samples XY coordinates. Any direction
>>>>>> would
>>>>>> be greatly appreciated.  I am also open to other viable methods.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________**_________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at r-project.org
>>>>>> https://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, NHH Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list