[R-sig-Geo] DCluster
Jakob Petersen
jakob.petersen at qmul.ac.uk
Mon Mar 28 16:42:18 CEST 2005
Hi r-sig-geo,
I am looking at the spatial distribution of poor households in a region
comprising a gradient of urban-rural postcodes. The data are counts and
they fit a negative binomial distribution, rather than a poisson
distribution.
I am applying DCluster (ver. 0.1-3, windows) and would be grateful for
advice on a few topics.
1. GAM
A) As I understand it the default setting is based on a poisson
distribution. This creates some not implausible clusters, but I wonder
whether I could set the opgam, so that it uses a negative binomial
distribution (for which I have the parameters for the ‘disease’
variable; size and mu) or to use a bootstrap procedure instead. Some of
the internal functions, like opgam.iscluster.negbin, seem to support
this, but I am uncertain about how to incorporate them.
B) To reduce the multiple testing problem (Waller & Gotway 2004,
“Applied Spatial Statistics for Public Health Data”, Wiley, p.208) I
wonder whether to set radius to <50% of step size, e.g. 100m radius in a
300m grid, so that the smallest circles won't touch?
2. Besag-Newell
I am getting results with ‘poisson’ (almost everything becomes a cluster
- possibly because the sites are clumped and not randomly distributed)
and with ‘permutation’, but wonders how the ‘negbin’ is used? Not like
this:
> bnresults<-opgam(pcpoor, thegrid=pcpoor[,c("x","y")], alpha=.05,
+ iscluster=bn.iscluster, set.idxorder=TRUE, k=20, model="negbin",
+ R=100, mle=calculate.mle(pcpoor) )
> > Error in rnbinom(n, size, prob) : invalid arguments
3. Kulldorff & Nagarwalla
Again I struggle with the parameters. Not like this:
> #K&N's method over the centroids
> mle<-calculate.mle(pcpoor, model="negbin")
> > Error in while (((abs(m - m0) > tol * (m + m0)) || (abs(v - v0) > tol
* :
missing value where TRUE/FALSE needed
> knresults<-opgam(data=pcpoor, thegrid=pcpoor[,c("x","y")], alpha=.05,
+ iscluster=kn.iscluster, fractpop=.5, R=100, model="negbin", mle=mle)
> > Error in rnbinom(n, size, prob) : invalid arguments
4. Turnbull. Is Turnbull analysis possible in DCluster yet?. Some
references in the manual, but haven’t been able to locate it.
5. General
A) I am considering increasing the study area (p.t. working with 1262
postcode points) and wonder what the limits might be for a desktop pc. I
gather that the distance matrices (created by tripack or spdep) could be
a limiting factor? Would it be an idea to run this step first and once
the table is created run the cluster detection algorithm?
B) I wonder whether permutations always are superior to standard stats.
Distributions, and if not, then why not?
Best wishes, Jakob
Jakob Petersen
GISc student (MSc)
Birkbeck, University of London
More information about the R-sig-Geo
mailing list