[R-sig-Geo] DCluster

Jakob Petersen jakob.petersen at qmul.ac.uk
Mon Mar 28 16:42:18 CEST 2005

Hi r-sig-geo,
I am looking at the spatial distribution of poor households in a region 
comprising a gradient of urban-rural postcodes. The data are counts and 
they fit a negative binomial distribution, rather than a poisson 

I am applying DCluster (ver. 0.1-3, windows) and would be grateful for 
advice on a few topics.

1. GAM

A) As I understand it the default setting is based on a poisson 
distribution. This creates some not implausible clusters, but I wonder 
whether I could set the opgam, so that it uses a negative binomial 
distribution (for which I have the parameters for the ‘disease’ 
variable; size and mu) or to use a bootstrap procedure instead. Some of 
the internal functions, like opgam.iscluster.negbin, seem to support 
this, but I am uncertain about how to incorporate them.

B) To reduce the multiple testing problem (Waller & Gotway 2004, 
“Applied Spatial Statistics for Public Health Data”, Wiley, p.208) I 
wonder whether to set radius to <50% of step size, e.g. 100m radius in a 
300m grid, so that the smallest circles won't touch?

2. Besag-Newell

I am getting results with ‘poisson’ (almost everything becomes a cluster 
- possibly because the sites are clumped and not randomly distributed) 
and with ‘permutation’, but wonders how the ‘negbin’ is used? Not like 

>  bnresults<-opgam(pcpoor, thegrid=pcpoor[,c("x","y")], alpha=.05,

+ iscluster=bn.iscluster, set.idxorder=TRUE, k=20, model="negbin",

+ R=100, mle=calculate.mle(pcpoor) )

> > Error in rnbinom(n, size, prob) : invalid arguments

3. Kulldorff & Nagarwalla

Again I struggle with the parameters. Not like this:

>  #K&N's method over the centroids

>  mle<-calculate.mle(pcpoor, model="negbin")

> > Error in while (((abs(m - m0) > tol * (m + m0)) || (abs(v - v0) > tol 
* :

missing value where TRUE/FALSE needed

>  knresults<-opgam(data=pcpoor, thegrid=pcpoor[,c("x","y")], alpha=.05,

+ iscluster=kn.iscluster, fractpop=.5, R=100, model="negbin", mle=mle)

> > Error in rnbinom(n, size, prob) : invalid arguments

4. Turnbull. Is Turnbull analysis possible in DCluster yet?. Some 
references in the manual, but haven’t been able to locate it.

5. General

A) I am considering increasing the study area (p.t. working with 1262 
postcode points) and wonder what the limits might be for a desktop pc. I 
gather that the distance matrices (created by tripack or spdep) could be 
a limiting factor? Would it be an idea to run this step first and once 
the table is created run the cluster detection algorithm?

B) I wonder whether permutations always are superior to standard stats. 
Distributions, and if not, then why not?

Best wishes, Jakob

Jakob Petersen
GISc student (MSc)
Birkbeck, University of London

More information about the R-sig-Geo mailing list