[R-sig-Geo] efficient code/function for rectangular SP weight Matrix and gwr

Fri May 11 16:55:20 CEST 2007

List,

I need to create a rectangular spatial weight matrix for a set of n and 
m objects. I quickly run in to memory allocation problems when 
constructing the full matrix in a single pass. I am looking for a more 
efficient way of doing this. There appears to be efficient procedures in 
spdep for constructing SQUARE spatial weight matrices (e.g. 
dnearneigh()). Are there analogous procedures for constructing distance 
based weights between two different point patterns? I am doing this in 
preparation for implementing an approximate geographically weighted 
logistic regression procedure. I was thinking about using re sampling 
procedure as an inferential frame- perhaps I might get some feedback. 
This is what I was going to do.

I have a point pattern of 30,000 diabetic people based on where they 
lived during a 2 year period. During that period, approximately 4% of 
them developed diabetes. I am interested in isolating the impact of 
ecological factors on the geographic variation" of the disease, so it is 
necessary to control for the spatial clustering of individual level risk 
factors associated with the disease (diabetes).

Step 1: Estimate a logistic regression using the full sample and predict 
incidence diabetes using individual level covariates (i.e. who developed 
diabetes over the two year period).

Step 2. Estimate a weighted logit model at each location (grid). The 
observations would be the people (not the geographic units) and the 
weights would be kernel weights based on distance. The model would only 
contain a single freely estimated parameter, the intercept, but it would 
also contain an offset term. For each patient, the offset term would 
simply be an evaluation of the linear predictor of the global model 
estimated above (based on the observed covariate values), but without 
the intercept. This would effectively fix the estimates of the patient 
level coefficients to their global values, requiring only a local 
estimate of the intercept. My hope is that I could interpret geographic 
variability in the intercept as evidence for a "location effect" net of 
the patient composition or "risk profile" at a particular location. It 
would probably make sense to center the X variables so that the 
intercept was interpretable and estimated in a region of the response 
plane where their is plenty of data. I would let the other covariates 
vary as well, but I doubt the model could be estimated in large portions 
of the study area because of sparse data.

Step 3. If I were going to do inference on the location specific 
intercepts, I would generate a sampling distribution at each location by 
re sampling from the global model, and repeat Step 2 for each randomly 
drawn sample. This would give me a local sampling distribution of 
intercept estimates at each location and I could compare it to the the 
single one generated from the observed data. The global model represents 
a kind of null because the intercept is fixed to its global value and 
geographic variability is driven entirely by the spatial clustering of 
patient level factors.

thanks!

Sam