[Rd] save() and interrupts

Tue Apr 17 14:48:30 CEST 2007

On Mon, 16 Apr 2007, Henrik Bengtsson wrote:

> On 4/16/07, Luke Tierney <luke at stat.uiowa.edu> wrote:
>> On Mon, 16 Apr 2007, Bill Dunlap wrote:
>> 
>> > On Sun, 15 Apr 2007, Henrik Bengtsson wrote:
>> >
>> >> On 4/15/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>> >>> On Sun, 15 Apr 2007, Henrik Bengtsson wrote:
>> >>>
>> >>>> are there any (cross-platform) specs on what the saved filed is if
>> >>>> save() is interrupted, e.g. by a user interrupt?   It could be
>> >>>> non-existing, empty, partly written, or completed.
>> >>>
>> >>> My understanding is that you cannot user interrupt compiled code unless 
>> it
>> >>> is set up to check interrupts.  Version 2 saves are done via the 
>> internal
>> >>> saveToConn, and I don't see any calls to R_CheckUserInterrupt there. So
>> >>> you only need to worry about user interrupts in the R code, and that 
>> has
>> >>> an on.exit action to close the connection (which should be executed 
>> even
>> >>> if you interrupt).  Which suggests that the file will be
>> >>>
>> >>> non-existent
>> >>> empty
>> >>> complete
>> >>>
>> >>> and the first two depend on interrupting in the millisecond or less 
>> before
>> >>> the compiled code gets called.
>> >>
>> >> I'll put it on my todo list to investigate how to make save() more
>> >> robust against interrupts before calling the internal code.  One
>> >> option is to use tryCatch().  However, that does not handle too
>> >> frequent user interrupts, e.g. if an interrupt is sent while in the
>> >> "interrupt" call, that will interrupt the function.  So, tryCatch()
>> >> alone will only lower the risk for incomplete empty files.  For data
>> >> written to files, one alternative is to check for files of zero size
>> >> in the on.exit() statement and remove such.
>> >>
>> >> /Henrik
>> >>>
>> >>> For other forms of interrupts, e.g. a Unix kill -9, the file state 
>> could
>> >>> be anything.
>> >>>
>> >>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> >>> ...
>> >
>> > You could change the code to write to a temporary
>> > file (in the directory you want the result in) and
>> > when you successfully finish writing to the file
>> > you rename it to the permanent name.  (On an interrupt
>> > you remove the temp file, and on 'kill -9' the only
>> > bad effect is the space used by the partially written
>> > temp file.)  This has the added advantage that you don't
>> > overwrite an existing save file by the given name until
>> > you know a suitable replacement is ready.
>> >
>> > Perhaps we need a connection type that encapsulates this.
>> >
>> > 
>> ----------------------------------------------------------------------------
>> > Bill Dunlap
>> > Insightful Corporation
>> > bill at insightful dot com
>> > 360-428-8146
>> >
>> > "All statements in this message represent the opinions of the author and 
>> do
>> > not necessarily reflect Insightful Corporation policy or position."
>> 
>> We do this with save.image.  Since save is a little more general it is
>> a bit less obvious what the right way to do this sort of thing is, or
>> whether there is a single right way.  I think if I was concerned about
>> this I would write something around the current save for particular
>> kinds of connections rather than changing save itself.  The main
>> reason for taking a different rout with save.image is that that gets
>> called implicitly by q().
>> 
>> [our current ability to manage user interrupts is not ideal--hopefully
>> we can make a bit of progress on this soon.]
>
> I was thinking about this last night:  It would be useful to have a
> feature/construct to evaluate an R expression atomically where user
> interrupts will *not have an affect until afterwards*, cf. calls to
> native code.  This would solve the problem of getting interrupts while
> in a tryCatch(..., interrupt=..., finally=...).  Of course this
> requires caution by the programmer, but it is also unlikely to be used
> by someone who do not know what the risks are.  I do not know the
> different signals available, but one could consider such atomic calls
> to be protected against different levels of signals.  In addition, one
> could have an optional threshold of the number of interrupt signals it
> takes to (even) interrupt an atomic evaluation.

This is the sort of thing I have been tinking about. One also needs to
enable interrupts within selected parts of such a construct, and these
things need to cooperate with each other and with internal code. There
is a paper on doing these sorts of things in a principled way in
Haskell that I want to spend some time reading to see what translates
to us.

Best,

luke

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu