[R-sig-Geo] DCluster questions

Fri Sep 6 23:05:42 CEST 2013

Dear James,

> I'm wondering if I can get a little advice on using DCluster.
> I have produced a map of areal incidence rates and I'd like to try and detect clusters.
> I have also implemented Bayesian smoothing and have therefore pre and post smoothing maps.

> This seems to work ok however there are a few things I'm confused on:
> 1) Should I be using this algorithm on my incidence rates pre or post Bayesian smoothing ? 
> I'm thinking that the mle expression above includes a smooth (do I understand that correctly ?) - 
> but I' prefer to utilise my hard-won Bayesian smooth if possible.

In principle, you should use the observed and expected (from rate
standardisation) cases. calculate.mle() just groups the data in a
suitable list to be used when resampling from the desired distribution
to compute p-values, i.e., this function computes the summary statistics
and parameters to be used in the bootstrapping.

> 2) Will opgam/kn.iscluster only detect "hotspots" or will they also detect 
> "coldspots" i.e. areas of statistically unlikely lower incidence rates ?

You get a p-value, so you could take the areas with very large p-values
as cold-spots. But detection of cold-spots is not of interest, is it?

> 3) I'm not familiar with bootstrapping - how many bootstraps should I be running and why 
> (i.e. - what should I set R to) ?

I believe that 99 should be fine, but you may increase it a bit. With 99
replicates you can find clusters up to a significance level of 0.01.
Higher number of replicates will heklp you to detect clusters which are
significant below this 0.01 level (which may be useful if you correct
for multiple testing, see below).

> 
> 4) How do I decide what the correct value for fractpop is ? I initially had it set to .25 and I 
> was getting cluster of 50% of my cases which made no sense.

You may try different values of fractpop if you want to detect smaller
clusters. 

> 5) Is there any correction for multiple testing in the opgam() command ? I have over 3000 areas - 
> do I need to set a very low alpha ?

For the Spatial Scan Statistic you should only pay attention to the
significance of the most likely cluster, as this is what Kulldorff's
test is testing for. But you can use the p-value reported for the other
secondary  clusters as a guidance.

For GAM, you probably want to make a multiple test correction. Another
good option is to plot the centres of the clusters detected as this will
give you and idea of the areas with significant high risk.

Hope this helps.

Virgilio