[R-sig-Geo] skater - spdep runtime - geographic territories

Elias T. Krainski e||@@kr@|n@k| @end|ng |rom gm@||@com
Wed Jun 12 00:52:10 CEST 2019


Hi Salo,

I have implemented it several years ago and this is not optimal some 
ways. I will update it in near future to account for an heuristic to 
avoid the exhaustive search that it performs. For now, you can find a 
significant runtime reduction considering an alternative function to 
compute the ssw because the way it does by default uses a lot of memory 
and is bad for big datasets.

Please consider the attached code that illustrates this fact. When using 
the ssdfun() I experienced a reduction factor around 4 for n=2k. I found 
an additional reduction factor of 1.6 by using two (physical) cores. 
This is the result I got on my laptop:

       n t1 t2 t3 t4
15  225  1  1  1  1
20  400  1  1  1  1
25  625  4  3  3  2
30  900 10  5  6  4
35 1225 21  8 13  5
40 1600 39 12 23  8
45 2025 86 24 50 15

best regards,

Elias

On 6/11/19 5:21 PM, Salo V wrote:
> Hi Everyone,
>
> I am trying to run the skater function for graph partitions, part of the
> spdep package. My goal is to create contiguous territories for the entire
> USA at the ZIP Code level.
>
> The function takes a very long time to run even for ~15% of my total areas.
> I am looking to run this for the 30,000 ZIP Codes in the USA.
>
> The skater function documentation gives an example of parallel processing,
> but it doesn’t seem to be speeding things up. I have a windows laptop with
> 2 physical cores and 4 logical cores. In the below code, I have already
> tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.
>
> Has anyone been able to run the skater function for a large amount of areas
> in a reasonable amount of time? Would really appreciate any guidance on
> this, perhaps I am missing steps.
>
>
>
> Here is the example from the documentation and which I am also running.
>
> *library*(parallel)
>
> nc <- detectCores(logical=FALSE)
>
> # set nc to 1L here
>
> *if* (nc > 1L) nc <- 1L
>
> coresOpt <- get.coresOption()
>
> invisible(set.coresOption(nc))
>
> *if*(!get.mcOption()) {
>
> # no-op, "snow" parallel calculation not available
>
>    cl <- makeCluster(get.coresOption())
>
>    set.ClusterOption(cl)
>
> }
>
> ### calculating costs
>
> system.time(plcosts <- nbcosts(bh.nb, dpad))
>
> all.equal(lcosts, plcosts, check.attributes=FALSE)
>
> ### making listw
>
> pnb.w <- nb2listw(bh.nb, plcosts, style="B")
>
> ### find a minimum spanning tree
>
> pmst.bh <- mstree(pnb.w,5)
>
> ### three groups with no restriction
>
> system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))
>
> *if*(!get.mcOption()) {
>
>    set.ClusterOption(NULL)
>
>    stopCluster(cl)
>
> }
>
>
> much appreciated!
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: elapsed-time-ssdfun.R
Type: text/x-r-source
Size: 2521 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20190611/bc768363/attachment.bin>


More information about the R-sig-Geo mailing list