[R-sig-Geo] GStat 'predict.c' parallel estimation
timjp at unimelb.edu.au
Fri Aug 28 05:15:47 CEST 2015
On 27/08/15 16:55, Edzer Pebesma wrote:
> On 08/27/2015 02:45 AM, Tim Peterson wrote:
>> Hi all,
>> I'm considering editing the GStat kriging file 'predict.c' (while loop
>> at line 151-165) for parallel estimation. I require parallel estimation
>> for both kriging and indicator simulation and will be applying it to a
>> region of ~30 million grid cells, which will be repeated for 300 time
>> I appreciate that snow can be used for parallel kriging
>> but, because of my computation burden, I require parallel calculation
>> using Xeon Phi cards - hence I think the changes should be in
>> 'predict.c'. Also, in looking at the GStat C-code I suspect that the
>> global variables in 'glvars.c' may complicate the parallel calculations.
>> So anyway, before I embark upon this task (most likely using openMP),
>> I'd be grateful if anyone could share tips or thoughts (and my edits
>> will be freely shared.)
> I've always thought of these global variables as being constant during
> program execution. R being non-thread safe, I'm not sure how well R
> combines with embedded C code that uses threads.
Thanks again Edzer. That's good to know about the global variables. Re
the threaded C code working with R, my understanding was that the while
loop undertaking the estimation is not interacting with R and so can be
made parallel without (hopefully) influencing R. Does this sound reasonable?
>> Also, some may be questioning the parallel estimation of each
>> simulation. Considering that I'm estimating ~30 million cells, and will
>> only be using ~70 cores per simulation, I am happy to accept the
>> deficiency that recently estimated cells will not be used in the
>> estimation of nearby cells within a concurrent parallel cycle. However,
>> the estimates do need to be accessible to latter parallel cycles.
> I can see that a shared memory approach could mess up the set of
> previously simulated values ("msim" in msim.c), and that filling in
> locally could be done in independent threads; yet, what do you do at the
> boundaries of these regions?
To clarify, the best approach I see for simulations is to randomly
select, say, 10*"number of cores" grid cells and undertake their
parallel estimation. These solutions are then added back into the "msim"
and another 10*"number of cores" grid cells are then selected and
estimated. This would overcome the boundary effects but would mean that
the cores may not be running at 100% all of the time (ie all 10*"number
of cores" cells would need to finish before the next batch could start)
> As an alternative to this approach (and the one you sketch below) you
> could think about a circulant embedding algorithm that uses a parallel
> version of the fft; see fields:sim.rf and the routines in package
Thanks. I didn't know about circulant embedding. However, my problem is
high non-bi-Gaussian. The extreme values within the system (which is
water table elevation) are significantly more correlated than 'average'
>> Lastly, my backup option is to use openMP to parallelise the GSLib
>> Fortran subroutines 'cokb3d.for' and 'sisim.for' (ie outside of R). This
>> is mainly because the subroutines are fairly standalone and don't use
>> pointers - and hence possibly less problematic to make parallel.
>> Many thanks,
>> Dr. Tim Peterson
>> The Department of Infrastructure Engineering
>> The University of Melbourne, 3010 Australia
>> T: +61 3 8344 9950, M: +61 0438 385 937
>> Dept. profile :
>> Research Gate : https://www.researchgate.net/profile/Tim_Peterson7
>> Google Scholar:
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
More information about the R-sig-Geo