[R-sig-hpc] RFC: Checkpoint-Restart for R/HPC (DMTCP)

Qiang Kou qkou at umail.iu.edu
Sun Jan 31 04:04:37 CET 2016


Hi, Gene,

I know DMTCP from the scipy conference. Your colleague showed a python
binding.

I have also tried to invoke dmtcp inside R just like your python binding.

It is not difficult as I remember.

Best,

KK

On Mon, Jan 25, 2016 at 8:03 PM, Gene Cooperman <gene at ccs.neu.edu> wrote:

> Hi Chirag,
>
>     This should work.  In my case, I would probably try running
> a job on a cloud as follows:
>
>     [ copy DMTCP executables to job submission directory ]
>     path_to_dmtcp_root/bin/dmtcp_launch -i 30 Rscript myscript.R
>
> This would create a checkpoint every 30 seconds.  So, every 30 seconds,
> we get a new version of the following files:
>
>     ckpt_myscript.R_*.dmtcp
>     dmtcp_restart_script_*.sh
>     dmtpc_restart_script.sh  (symbolic link to dmtcp_restart_script_*.sh)
>
> If a job crashes, one copies the above files to a new directory, and
> submits a new Cloud job:
>
>     [ copy DMTCP executables to job submission directory ]
>     ./dmtcp_restart_script.sh -i 30
>
> The script should automatically link to the file ckpt_myscript.R_*.dmtcp .
> An alternative approach would be:
>
>     path_to_dmtcp_root/bin/dmtcp_restart -i 30 ckpt_myscript.R_*.dmtcp
>
> Please don't hesitate to ask, if I can help further.
>
> Best,
> - Gene
>
>
> On Mon, Jan 25, 2016 at 05:26:58PM +0530, Chirag Anand wrote:
> > This can indeed be very useful, especially while using one of the
> > cloud services. Cloud VMs often crash because of an error on the main
> > system, thereby, losing state of the program (R computations). I think
> > Google Cloud Engine supports live migration of VMs, though not sure
> > which technology they are using, but AWS does not.
> >
> ...
> >
> > --
> > Chirag Anand
> > http://atvariance.in/chiraganand
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



-- 
Qiang Kou
qkou at umail.iu.edu
School of Informatics and Computing, Indiana University

	[[alternative HTML version deleted]]



More information about the R-sig-hpc mailing list