[R-sig-Geo] GStat 'predict.c' parallel estimation

Edzer Pebesma edzer.pebesma at uni-muenster.de
Thu Aug 27 08:55:54 CEST 2015



On 08/27/2015 02:45 AM, Tim Peterson wrote:
> 
> 
> Hi all,
> 
> I'm considering editing the GStat kriging file 'predict.c' (while loop
> at line 151-165) for parallel estimation. I require parallel estimation
> for both kriging and indicator simulation and will be applying it to a
> region of ~30 million grid cells, which will be repeated for 300 time
> points.
> 
> I appreciate that snow can be used for parallel kriging
> (https://r-forge.r-project.org/scm/viewvc.php/pkg/demo/snow.R?view=markup&revision=89&root=gstat)
> but, because of my computation burden, I require parallel calculation
> using Xeon Phi cards - hence I think the changes should be in
> 'predict.c'. Also, in looking at the GStat C-code I suspect that the
> global variables in 'glvars.c' may complicate the parallel calculations.
> So anyway, before I embark upon this task (most likely using openMP),
> I'd be grateful if anyone could share tips or thoughts (and my edits
> will be freely shared.)

I've always thought of these global variables as being constant during
program execution. R being non-thread safe, I'm not sure how well R
combines with embedded C code that uses threads.

> 
> Also, some may be questioning the parallel estimation of each
> simulation. Considering that I'm estimating ~30 million cells, and will
> only be using ~70 cores per simulation, I am happy to accept the
> deficiency that recently estimated cells will not be used in the
> estimation of nearby cells within a concurrent parallel cycle. However,
> the estimates do need to be accessible to latter parallel cycles.

I can see that a shared memory approach could mess up the set of
previously simulated values ("msim" in msim.c), and that filling in
locally could be done in independent threads; yet, what do you do at the
boundaries of these regions?

As an alternative to this approach (and the one you sketch below) you
could think about a circulant embedding algorithm that uses a parallel
version of the fft; see fields:sim.rf and the routines in package
RandomFields.

> 
> Lastly, my backup option is to use openMP to parallelise the GSLib
> Fortran subroutines 'cokb3d.for' and 'sisim.for' (ie outside of R). This
> is mainly because the subroutines are fairly standalone and don't use
> pointers - and hence possibly less problematic to make parallel.
> 
> Many thanks,
> 
> Tim
> 
> ----------------------
> Dr. Tim Peterson
> 
> The Department of Infrastructure Engineering
> The University of Melbourne, 3010 Australia
> T: +61 3 8344 9950, M: +61 0438 385 937
> 
> Dept. profile :
> http://www.ie.unimelb.edu.au/people/staff.php?person_ID=141135
> Research Gate : https://www.researchgate.net/profile/Tim_Peterson7
> Google Scholar:
> http://scholar.google.com.au/citations?user=kkYJLF4AAAAJ&hl=en&oi=ao
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi),  University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20150827/54a4ea40/attachment.bin>


More information about the R-sig-Geo mailing list