[R-sig-hpc] RFC: Checkpoint-Restart for R/HPC (DMTCP)
Qiang Kou
qkou at umail.iu.edu
Sun Jan 31 04:04:37 CET 2016
Hi, Gene,
I know DMTCP from the scipy conference. Your colleague showed a python
binding.
I have also tried to invoke dmtcp inside R just like your python binding.
It is not difficult as I remember.
Best,
KK
On Mon, Jan 25, 2016 at 8:03 PM, Gene Cooperman <gene at ccs.neu.edu> wrote:
> Hi Chirag,
>
> This should work. In my case, I would probably try running
> a job on a cloud as follows:
>
> [ copy DMTCP executables to job submission directory ]
> path_to_dmtcp_root/bin/dmtcp_launch -i 30 Rscript myscript.R
>
> This would create a checkpoint every 30 seconds. So, every 30 seconds,
> we get a new version of the following files:
>
> ckpt_myscript.R_*.dmtcp
> dmtcp_restart_script_*.sh
> dmtpc_restart_script.sh (symbolic link to dmtcp_restart_script_*.sh)
>
> If a job crashes, one copies the above files to a new directory, and
> submits a new Cloud job:
>
> [ copy DMTCP executables to job submission directory ]
> ./dmtcp_restart_script.sh -i 30
>
> The script should automatically link to the file ckpt_myscript.R_*.dmtcp .
> An alternative approach would be:
>
> path_to_dmtcp_root/bin/dmtcp_restart -i 30 ckpt_myscript.R_*.dmtcp
>
> Please don't hesitate to ask, if I can help further.
>
> Best,
> - Gene
>
>
> On Mon, Jan 25, 2016 at 05:26:58PM +0530, Chirag Anand wrote:
> > This can indeed be very useful, especially while using one of the
> > cloud services. Cloud VMs often crash because of an error on the main
> > system, thereby, losing state of the program (R computations). I think
> > Google Cloud Engine supports live migration of VMs, though not sure
> > which technology they are using, but AWS does not.
> >
> ...
> >
> > --
> > Chirag Anand
> > http://atvariance.in/chiraganand
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
--
Qiang Kou
qkou at umail.iu.edu
School of Informatics and Computing, Indiana University
[[alternative HTML version deleted]]
More information about the R-sig-hpc
mailing list