[R-sig-Geo] Ripley's K-function and CSR/CSRT test for Very Large Dataset

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Thu May 26 11:40:55 CEST 2011


On Thu, May 26, 2011 at 9:21 AM, ruocco <ruocco at idi.ntnu.no> wrote:
> Hi,
> i am trying to apply the CSR and CSRT test, in a case-control context.
> The size of control data is ~400.000 points. I have then a set of case data
> (around 500 and i want to verify, for each one, the interaction between the
> case and control data) and the size of these data vary from 100 to 4000
> events.
>
> I am using the ripley k-function and the test statistic D(h)/sqrt(var(D)),
> for each distance h in the range.
>
> The problem is that i should use Monte Carlo hypothesis testing to obtain
> the p-value, but with this size of control data the computational cost is
> prohibitive.
>
> Are there other types of tests, less computationally expensive to verify the
> null hypothesis applying the ripley's K-function on very large dataset?

 Any reason why you can't randomly sample from your control points,
and do the analysis with 4,000 control points instead of 400,000?
Repeat that a few times to get an idea of the sensitivity and that's
job done.

 K-function tests for  500 cases/4000 controls should be doable on a
PC these days.

Barry



More information about the R-sig-Geo mailing list