[R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR
Roger Bivand
Roger.Bivand at nhh.no
Tue Sep 3 21:54:01 CEST 2013
yOn Fri, 30 Aug 2013, Roger Bivand wrote:
> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>
>> Thank you. I'd like to subset into a specific county. Should there be
>> further partitioning from that level?
>>
>
> No idea. Please re-create your scenario by subsetting georgia and the
> coordinates to suit.
>
library(spgwr)
example(georgia)
gSRDF1 <- gSRDF[1:100,]
gSRDF2 <- gSRDF[101:159,]
bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
PctBlack, data=gSRDF1, adapt=TRUE, method="aic")
model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
PctBlack, data=gSRDF1, adapt=bwsel, hatmatrix=TRUE)
PredictionsOfNewData <- gwr(PctBach ~ TotPop90 + PctRural + PctEld +
PctFB + PctPov + PctBlack, data=gSRDF1, fit.points=gSRDF2, adapt=bwsel,
prediction=TRUE, fittedGWRobject=model1)
plot(gSRDF2$PctBach, PredictionsOfNewData$SDF$pred)
with the development version of spgwr on R-forge; with the released
version the polygons of gSRDF2 cause an error. Note your confusion about
adapt= in gwr(), if set as adapt=TRUE, this means adapt=1, so includes all
the observations in the kernel, setting a very broad bandwidth. Never call
gw.adapt(), it isn't a user-level function, but is exposed for exploring
the inadequacies of GWR as a method. I would have appreciated an answer
wrt. whether your held out test set is random or clustered, but here I've
just subsetted the data in the simplest way.
Roger
> Roger
>
>>
>> On Fri, Aug 30, 2013 at 10:19 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>>
>>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>>
>>> Alrighty then!
>>>>
>>>
>>> Thanks. Now make this your case by subsetting georgia in a way that
>>> matches your case (all counties west of x?, random set?), and we may be
>>> getting closer. In the geographical partition, the fit points are all a
>>> long way from the data points, in the random case, they aren't grouped in
>>> the same way. You may also need to run the model twice, passing the fitted
>>> model (fit.points == data.points) through to the next stage, but I'm
>>> unsure
>>> about that.
>>>
>>> Roger
>>>
>>>
>>>> Say I create this adaptive bandwidth model using the original dataset
>>>> "georgia"
>>>>
>>>> coords = cbind(georgia$x, georgia$y)
>>>> bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov
>>>> +
>>>> PctBlack, data=georgia, adapt=TRUE, coords, gweight=gwr.Gauss, method =
>>>> "aic" )
>>>> bw1 <- gw.adapt(coords, coords, quant=bwsel)
>>>> model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
>>>> PctBlack, data=georgia, bw=b1, coords, hatmatrix=T)
>>>> model 1
>>>>
>>>> Suppose I receive an updated data set (same dependent and independent
>>>> variables) and I wish to test the above model1's ability to predict the
>>>> dependent variable of these new data points. If this were a basic lm
>>>> regression in R, I would use the "predict()" command. I wish to better
>>>> understand how I would do so using a GWR model. I found the below
>>>> procedure, but I would like to know first if it is capable accomplishing
>>>> this task, and secondly, if I am specifying it correctly. It seems to me
>>>> that this procedure, as it stands, doesn't take into account the
>>>> appropriate bandwidths for the new data, say, "georgiaNewData"
>>>>
>>>> PredictionsOfNewData <- gwr(PctBach ~ TotPop90 + PctRural + PctEld +
>>>> PctFB
>>>> + PctPov + PctBlack, data=gSRDF, adapt=TRUE, gweight=gwr.Gauss, method =
>>>> "aic", bandwidth=bw1,
>>>> predictions=TRUE, fit.points=georgiaNewData)
>>>> PredictionsOfNewData
>>>>
>>>> Thanks in advance for guidance and insight...
>>>>
>>>>
>>>> On Fri, Aug 30, 2013 at 9:01 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>> wrote:
>>>>
>>>> Provide a reproducible code example of your problem using a built in
>>>> data
>>>>> set. No reproducible example, no response, as I cannot guess (and likely
>>>>> nobody else can either) what your specific misunderstanding is. Code
>>>>> using
>>>>> for example the Georgia data set in the package. You seem to be assuming
>>>>> that you understand how GWR works, I don't think that you do, so you
>>>>> have
>>>>> to show what you mean in code.
>>>>>
>>>>> Roger
>>>>>
>>>>>
>>>>> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>>>>>
>>>>> Roger,
>>>>>
>>>>>>
>>>>>> I think all I would like to know is if it is possible to apply a
>>>>>> calibrated
>>>>>> GWR model to a hold-out sample, and if so, what the most accurate way
>>>>>> to
>>>>>> do
>>>>>> so is. I understand the pitfalls of GWR but would like to learn as much
>>>>>> as
>>>>>> I can before progressing to the next spatial methodology I learn in R.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 30, 2013 at 3:37 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>> wrote:
>>>>>>
>>>>>> Paul, Luis,
>>>>>>
>>>>>>>
>>>>>>> I suspect that your speculations are completely wrong-headed. Please
>>>>>>> provide a reproducible example with a built-in data set, so that there
>>>>>>> is
>>>>>>> at least minimal clarity in what you are guessing. Note in addition
>>>>>>> that
>>>>>>> GWR as a technique should not be used for anything other than
>>>>>>> exploration
>>>>>>> of possible mis-specification in the underlying model with the given
>>>>>>> data,
>>>>>>> as patterning in coefficients is induced by GWR for simulated
>>>>>>> covariates
>>>>>>> with no pattern.
>>>>>>>
>>>>>>> Roger
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 30 Aug 2013, Luis Guerra wrote:
>>>>>>>
>>>>>>> Thank you Luis. When calibrating the adaptive model, using adapt=t in
>>>>>>> the
>>>>>>>
>>>>>>> bandwidth selection created the proportion you speak of, which then
>>>>>>>>
>>>>>>>>> allowed
>>>>>>>>> me to create a bandwidth matrix using gwr.adapt. However, this has
>>>>>>>>> not
>>>>>>>>> worked for me with holdout samples. Have you had success in this
>>>>>>>>> regard?
>>>>>>>>>
>>>>>>>>> Now I get what you mean. Let's show an example:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> bw <- gwr.sel(var ~ var1, data=yourdata, adapt=TRUE)
>>>>>>>> m <- gwr(var~var1, data=yourdata, adapt=bw, fit.points=newdata)
>>>>>>>>
>>>>>>>> So an adaptative bandwidth (bw) is calculated based on"yourdata",
>>>>>>>> while
>>>>>>>> you
>>>>>>>> are fitting "newdata" later on using that previously found bw. I had
>>>>>>>> not
>>>>>>>> thought about it previously. Let's see whether someone else can help
>>>>>>>> you
>>>>>>>> (us).
>>>>>>>>
>>>>>>>>
>>>>>>>> I do not know the intended influence of these "fit.points". I would
>>>>>>>> think
>>>>>>>>
>>>>>>>> that new localized regressions are not calculated, as we're testing
>>>>>>>>> the
>>>>>>>>> model and previous data points' ability to predict for these new
>>>>>>>>> ones,
>>>>>>>>> but
>>>>>>>>> I could be wrong. My current method, however, is producing much
>>>>>>>>> poorer
>>>>>>>>> results with the holdouts, which I am fairly sure is related to my
>>>>>>>>> inability to incorporate the new points necessary bandwidths.
>>>>>>>>>
>>>>>>>>> Coming back to the previously created example, imagine that
>>>>>>>>> "newdata"
>>>>>>>>>
>>>>>>>>> is a
>>>>>>>> single point that you want to fit. Imagine now that "yourdata" is a
>>>>>>>> sample
>>>>>>>> with 1000 cases. Then you are getting 1000 models with 1000 different
>>>>>>>> intercepts and 1000 different beta values to adjust var1, rigth?
>>>>>>>> Which
>>>>>>>> of
>>>>>>>> all these parameters do you use for fitting "newdata"? And something
>>>>>>>> else,
>>>>>>>> what would happen with "newdata" if it is enough far away from
>>>>>>>> "yourdata"
>>>>>>>> and we would be using a fixed bandwidth?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 29, 2013 8:56 PM, "Luis Guerra" <luispelayo84 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Dear Paul,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I am dealing with this kind of problems right now, and if I am not
>>>>>>>>>> wrong,
>>>>>>>>>> when you want to apply an adaptative bandwidth, you should
>>>>>>>>>> introduce a
>>>>>>>>>> value for the "adapt" parameter instead of for the "bandwidth"
>>>>>>>>>> parameter.
>>>>>>>>>> This value will be between 0 and 1 and indicates the proportion of
>>>>>>>>>> cases
>>>>>>>>>> around your regression point that should be included to estimate
>>>>>>>>>> each
>>>>>>>>>> local
>>>>>>>>>> model. So depending on the amount of points around each case, the
>>>>>>>>>> model
>>>>>>>>>> will use a different bandwidth for each point to be fitted.
>>>>>>>>>>
>>>>>>>>>> Related to your question, do you know what is the influence of the
>>>>>>>>>> data
>>>>>>>>>> introduced in the "data" parameter to the data to be fitted
>>>>>>>>>> (introduced
>>>>>>>>>> in
>>>>>>>>>> the "fit.points" parameter)? I mean, you have to obtain new local
>>>>>>>>>> models
>>>>>>>>>> (one for each point to be fitted), so I do not understand whether
>>>>>>>>>> the
>>>>>>>>>> "data" parameter is used somehow...
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Luis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 30, 2013 at 1:26 AM, Paul Bidanset <pbidanset at gmail.com
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Hi Folks,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I was curious if anyone has had experience applying an SPGWR model
>>>>>>>>>>> with
>>>>>>>>>>> an
>>>>>>>>>>> adaptive bandwidth matrix to a holdout or validation sample. I am
>>>>>>>>>>> using
>>>>>>>>>>> the
>>>>>>>>>>> "fit.points" command, which does not seem to allow for a new
>>>>>>>>>>> bandwidth
>>>>>>>>>>> calibrated around the holdout samples XY coordinates. Any
>>>>>>>>>>> direction
>>>>>>>>>>> would
>>>>>>>>>>> be greatly appreciated. I am also open to other viable methods.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Paul
>>>>>>>>>>>
>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>
>>>>>>>>>>> ______________________________******_________________
>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-sig-geo<https://stat.ethz.ch/mailman/****listinfo/r-sig-geo>
>>>>>>>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>>>>>>
>>>>>>>>>>> <https://**stat.ethz.ch/**mailman/listinfo/**r-sig-geo<http://stat.ethz.ch/mailman/listinfo/**r-sig-geo>
>>>>>>>>>>> <h**ttps://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> ______________________________******_________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-sig-geo<https://stat.ethz.ch/mailman/****listinfo/r-sig-geo>
>>>>>>>> <https://**stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>>>>
>>>>>>>> <https://**stat.ethz.ch/**mailman/listinfo/**r-sig-geo<http://stat.ethz.ch/mailman/listinfo/**r-sig-geo>
>>>>>>>> <h**ttps://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>> Roger Bivand
>>>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>> Roger Bivand
>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> --
>>> Roger Bivand
>>> Department of Economics, NHH Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>> e-mail: Roger.Bivand at nhh.no
>>>
>>>
>>
>>
>>
>
>
--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list