[R-sig-Geo] efficient code/function for rectangular SP weight Matrix and gwr
Sam Field
fieldsh at mail.med.upenn.edu
Wed May 23 17:24:51 CEST 2007
Roger Bivand wrote:
> On Tue, 22 May 2007, Sam Field wrote:
>
>
>> In the event that others are following this thread, I have included
>> some r-code for constructing a rectangular weight matrix from two sets
>> of coordinates (in feet). The first set of coords (nhood_pts[,1] and
>> nhood_pts[,2]) come from a regular grid, the second set refer to the
>> location of patients ( combo2$X and combo2$Y). I don't provide the data
>> only the code below. The original data consists of 1000 grids and
>> 30000+ patients. It takes 30 minutes on my fairly speedy computer to
>> complete the first step- extracting pair-wise distances less then a
>> given bandwidth (D). I store the neighbor ids and counts in separate
>> lists. The rest of the code calculates kernel weights based on these
>> distances. The code is not efficient from any number of standpoints, but
>> storing these distances in a list rather then a matrix avoided running
>> into memory allocation problems. Here is the code.
>>
>
> Would spDistsN1() in sp have helped to speed things up a little? Looping
> over the 1k grid points and passing out the 30k patient points shouldn't
> be very demanding (34 seconds on a oldish box, more if you need Great
> Circle distances). For heavyweight things, quadtrees would be a better
> choice, I have an unpublished package for approximate nearest neighbours.
>
> Roger
>
>
The short answer is yes. The whole calculation went from 30 minutes to
under 30 seconds!
thanks Roger! R is well documented, but these lists are invaluable!
I am using these weights in a logistic GWR. The bootstrapping step took
3 days. Are there functions that someone has written in R for logistic
GWR (or any other GLM for that matter)? I need a function that allows
for an offset term in the linear predictor. What I am using now works
(looping through glm()), but on this problem it takes a very, very long
time.
Sam
apologies to Roger for sending this directly.
>> # define bandwidth
>> D <- 2*5280
>>
>>
>> #This creates a list of distances < 1 mile
>> w_raw <- vector("list",length(nhood_pts[,1]))
>> for (i in 1:length(nhood_pts[,1])){
>> w_raw[[i]] <- rep(NA,length(combo2$X))}
>>
>> system.time(for (i in 1:length(nhood_pts[,1])){for(j in 1:length(combo2$X)){
>> if(sqrt((nhood_pts[,1][i]-combo2$X[j])^2 +
>> (nhood_pts[,2][i]-combo2$Y[j])^2) < D) w_raw[[i]][j] <-
>> sqrt((nhood_pts[,1][i]-combo2$X[j])^2 +
>> (nhood_pts[,2][i]-combo2$Y[j])^2)}} )
>>
>>
>>
>> w_raw.id <- vector("list",length(nhood_pts[,1]))
>> for (i in 1:length(nhood_pts[,1])){
>> w_raw.id[[i]] <- which(!is.na(w_raw[[i]]))}
>>
>> for (i in 1:length(nhood_pts[,1])){
>> w_raw[[i]] <- w_raw[[i]][w_raw.id[[i]]]}
>>
>>
>> w_raw.count <- rep(1,length(nhood_pts[,1]))
>> for (i in 1:length(nhood_pts[,1])){
>> w_raw.count[i] <- length(w_raw[[i]])}
>>
>> w_weights <- vector("list",length(nhood_pts[,1]))
>> for (i in 1:length(nhood_pts[,1])){
>> w_weights[[i]] <- (1-(w_raw[[i]]/D)^2)^2}
>>
>> w <- vector("list",length(nhood_pts[,1]))
>> for (i in 1:length(nhood_pts[,1])){
>> w[[i]] <- w_weights[[i]]/sum(w_weights[[i]])}
>>
>>
>>
>>
>>
>> Roger Bivand wrote:
>>
>>> On Fri, 11 May 2007, Stéphane Dray wrote:
>>>
>>>
>>>
>>>> Hi Sam,
>>>>
>>>> I think that this question is quite general and could interest other
>>>> people, including me, with very different aims. I have developed a
>>>> method to look for the relationships between two data sets that have
>>>> been sampled on the same area but for different locations. In my
>>>> example, the two samples are two polygons layers. In this approach, I
>>>> compute a rectangular weighting matrix where each weight correspond to
>>>> the area of intersection between polygons of each layer. I have used
>>>> also the matrix form to store these weights (my data set was very small
>>>> compared to you). I remember that Roger was also interested by these
>>>> rectangular weights in another context. Here we have different problems:
>>>> - how to compute these kind of weights
>>>> - how to store them.
>>>>
>>>> For the first point, I think that for each method/application, the
>>>> solution is different. We could develop/extend classical tools for
>>>> square weights (one set of spatial units) to rectangular weights (two
>>>> sets of spatial units).
>>>> For the second one, It would be probably interesting to define a class
>>>> of object in spdep. nb objects are lists, and I think that it would be
>>>> the solution for rectangular neighborhood.
>>>>
>>>> If I consider two sets of spatial units (A and B) where the number of
>>>> units is equal to na and nb. We could store the neighbors in a list of
>>>> length 2. The first element of this list is a list of length na. In this
>>>> list, the j-th element is a vector of the neighbors of the j-th unit of
>>>> the layer A. These neighbors are spatial units of the layer B. The
>>>> second element of the global list is a list of length nb where each
>>>> element is a vector of neighbors.
>>>>
>>>> I think that we have to think to a class of object that could be useful
>>>> for everybody dealing with this kind of rectangular weights. If this
>>>> class is properly defined (second point), we could then develop tools to
>>>> construct this kind of neighborhoods (first point). The eventual
>>>> extension to more than two data sets could also be taken into account in
>>>> this reflexion.
>>>>
>>>>
>>>>
>>> I would welcome input on this. I'm looking at an alternative weights
>>> representation through classes in the Matrix package, which is evolving
>>> fast, and which seems to be promising. If the dimnames slot is used to
>>> hold the region.id values, it might be possible to make progress.
>>>
>>> Best wishes,
>>>
>>> Roger
>>>
>>>
>>>
>>>> Cheers,
>>>>
>>>>
>>>> Sam Field wrote:
>>>>
>>>>
>>>>> List,
>>>>>
>>>>> I need to create a rectangular spatial weight matrix for a set of n and
>>>>> m objects. I quickly run in to memory allocation problems when
>>>>> constructing the full matrix in a single pass. I am looking for a more
>>>>> efficient way of doing this. There appears to be efficient procedures in
>>>>> spdep for constructing SQUARE spatial weight matrices (e.g.
>>>>> dnearneigh()). Are there analogous procedures for constructing distance
>>>>> based weights between two different point patterns? I am doing this in
>>>>> preparation for implementing an approximate geographically weighted
>>>>> logistic regression procedure. I was thinking about using re sampling
>>>>> procedure as an inferential frame- perhaps I might get some feedback.
>>>>> This is what I was going to do.
>>>>>
>>>>> I have a point pattern of 30,000 diabetic people based on where they
>>>>> lived during a 2 year period. During that period, approximately 4% of
>>>>> them developed diabetes. I am interested in isolating the impact of
>>>>> ecological factors on the geographic variation" of the disease, so it is
>>>>> necessary to control for the spatial clustering of individual level risk
>>>>> factors associated with the disease (diabetes).
>>>>>
>>>>> Step 1: Estimate a logistic regression using the full sample and predict
>>>>> incidence diabetes using individual level covariates (i.e. who developed
>>>>> diabetes over the two year period).
>>>>>
>>>>> Step 2. Estimate a weighted logit model at each location (grid). The
>>>>> observations would be the people (not the geographic units) and the
>>>>> weights would be kernel weights based on distance. The model would only
>>>>> contain a single freely estimated parameter, the intercept, but it would
>>>>> also contain an offset term. For each patient, the offset term would
>>>>> simply be an evaluation of the linear predictor of the global model
>>>>> estimated above (based on the observed covariate values), but without
>>>>> the intercept. This would effectively fix the estimates of the patient
>>>>> level coefficients to their global values, requiring only a local
>>>>> estimate of the intercept. My hope is that I could interpret geographic
>>>>> variability in the intercept as evidence for a "location effect" net of
>>>>> the patient composition or "risk profile" at a particular location. It
>>>>> would probably make sense to center the X variables so that the
>>>>> intercept was interpretable and estimated in a region of the response
>>>>> plane where their is plenty of data. I would let the other covariates
>>>>> vary as well, but I doubt the model could be estimated in large portions
>>>>> of the study area because of sparse data.
>>>>>
>>>>> Step 3. If I were going to do inference on the location specific
>>>>> intercepts, I would generate a sampling distribution at each location by
>>>>> re sampling from the global model, and repeat Step 2 for each randomly
>>>>> drawn sample. This would give me a local sampling distribution of
>>>>> intercept estimates at each location and I could compare it to the the
>>>>> single one generated from the observed data. The global model represents
>>>>> a kind of null because the intercept is fixed to its global value and
>>>>> geographic variability is driven entirely by the spatial clustering of
>>>>> patient level factors.
>>>>>
>>>>>
>>>>> thanks!
>>>>>
>>>>> Sam
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>
>
More information about the R-sig-Geo
mailing list