[Rd] proper use of reg.finalizer to close connections
Murat Tasan
mmuurr at gmail.com
Mon Oct 27 18:10:26 CET 2014
Eh, after some flailing, I think I solved it.
I _think_ this pattern should guarantee that the finalizer function is
still present when needed:
.STATE_CONTAINER <- new.env(parent = emptyenv())
.STATE_CONTAINER$some_state_variable <- ## some code
.STATE_CONTAINER$some_other_state_variable <- ## some code
.myFinalizer <- function(name_of_state_variable_to_clean_up)
.onLoad <- function(libname, pkgname) {
reg.finalizer(
e = parent.env(environment()),
f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer),
onexit = TRUE)
}
This way, the finalizer is registered on the enclosing environment of
the .onLoad function, which should be the package environment itself.
And that means .myFinalizer should still be around when it's called
during q() or unload/gc().
Effectively, the finalizer is tied to the entire package, rather than
the state variable container(s), which might not be the most elegant
solution, but it should work well enough for most purposes.
Cheers and thanks for the advice,
-m
On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan <mmuurr at gmail.com> wrote:
> Ah, good point, I hadn't thought of that detail.
> Would moving reg.finalizer back outside of .onLoad and hooking it to the
> package's environment itself work (more safely)?
> Something like:
> finalizerFunction <- ## cleanup code
> reg.finalizer(parent.env(), finalizerFunction)
>
> -m
>
> On Oct 26, 2014 11:03 PM, "Henrik Bengtsson" <hb at biostat.ucsf.edu> wrote:
>>
>> On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan <mmuurr at gmail.com> wrote:
>> > Ah (again)!
>> > Even with my fumbling presentation of the issue, you gave me the hint
>> > that solved it, thanks!
>> >
>> > Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
>> > it's not called once during package installation and then never again.
>> > And once I switched to using ls() (instead of names()), everything
>> > works as expected.
>> >
>> > So, the package code effectively looks like so:
>> >
>> > .CONNS <- new.env(parent = emptyenv())
>> > .onLoad <- function(libname, pkgname) {
>> > reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
>> > }
>> > .disconnect <- function(x) {
>> > ## handle disconnection of .CONNS[[x]] here
>> > }
>>
>> In your example above, I would be concerned about what happens if you
>> detach/unload your package, because then you're finalizer is still
>> registered and will be called whenever '.CONNS' is being garbage
>> collector (or there after). However, the finalizer function calls
>> .disconnect(), which is no longer available.
>>
>> Finalizers should be used with great care, because you're not in
>> control in what order things are occurring and what "resources" are
>> around when the finalizer function is eventually called and when it is
>> called. I've been bitten by this a few times and it can be very hard
>> to reproduce and troubleshoot such bugs. See also the 'Note' of
>> ?reg.finalizer.
>>
>> My $.02
>>
>> /Henrik
>>
>> >
>> > Cheers and thanks!
>> >
>> > -m
>> >
>> >
>> >
>> >
>> > On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi <csardi.gabor at gmail.com>
>> > wrote:
>> >> Well, to be honest I don't understand fully what you are trying to do.
>> >> If you want to run code when the package is detached or when it is
>> >> unloaded, then use a hook:
>> >> http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks
>> >>
>> >> If you want to run code when an object is freed, then use a finalizer.
>> >>
>> >> Note that when you install a package, R runs all the code in the
>> >> package and only stores the results of the code in the installed
>> >> package. So if you create an object outside of a function in your
>> >> package, then only the object will be stored in the package, but not
>> >> the code that creates it. The object will be simply loaded when you
>> >> load the package, but it will not be re-created.
>> >>
>> >> Now, I am not sure what happens if you set the finalizer on such an
>> >> object in the package. I can imagine that the finalizer will not be
>> >> saved into the package, and is only used once, when
>> >> building/installing the package. In this case you'll need to set the
>> >> finalizer in .onLoad().
>> >>
>> >> Gabor
>> >>
>> >> On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan <mmuurr at gmail.com> wrote:
>> >>> Ah, thanks for the ls() vs names() tip!
>> >>> (But sadly, it didn't solve the issue... )
>> >>>
>> >>> So, after some more tinkering, I believe the finalizer is being called
>> >>> _sometimes_.
>> >>> I changed the reg.finalizer(...) call to just this:
>> >>>
>> >>> reg.finalizer(.CONNS, function(x) print("foo"), onexit = TRUE)
>> >>>
>> >>> Now, when I load the package and detach(..., unload = TRUE), nothing
>> >>> prints.
>> >>> And when I quit, nothing prints.
>> >>>
>> >>> If I, however, create an environment on the workspace, like so:
>> >>>> e <- new.env(parent = emptyenv())
>> >>>> reg.finalizer(e, function(x) print("bar"), onexit = TRUE)
>> >>> When I quit (or rm(e)), "bar" is printed.
>> >>> But no "foo" (corresponding to same sequence of code, just in the
>> >>> package instead).
>> >>>
>> >>> BUT(!), when I _install_ the package, "foo" is printed at the end of
>> >>> the "**testing if installed package can be loaded" installation
>> >>> segment.
>> >>> So, somehow the R script that tests for package loading/unloading is
>> >>> triggering the finalizer (which is good).
>> >>> Yet, I cannot seem to trigger it myself when either quitting or
>> >>> forcing a package unload (which is bad).
>> >>>
>> >>> Any ideas why the installation script would successfully trigger a
>> >>> finalizer while standard unloading or quitting wouldn't?
>> >>>
>> >>> Cheers and thanks!
>> >>>
>> >>> -m
>> >>>
>> >>> On Sun, Oct 26, 2014 at 8:03 PM, Gábor Csárdi <csardi.gabor at gmail.com>
>> >>> wrote:
>> >>>> Hmmm, I guess you will want to put the actual objects that represent
>> >>>> the connections into the environment, at least this seems to be the
>> >>>> easiest to me. Btw. you need ls() to list the contents of an
>> >>>> environment, instead of names(). E.g.
>> >>>>
>> >>>> e <- new.env()
>> >>>> e$foo <- 10
>> >>>> e$bar <- "aaa"
>> >>>> names(e)
>> >>>> #> NULL
>> >>>> ls(e)
>> >>>> #> [1] "bar" "foo"
>> >>>> reg.finalizer(e, function(x) { print(ls(x)) })
>> >>>> #> NULL
>> >>>> rm(e)
>> >>>> gc()
>> >>>> #> [1] "bar" "foo"
>> >>>> #> used (Mb) gc trigger (Mb) max used (Mb)
>> >>>> #> Ncells 1528877 81.7 2564037 137.0 2564037 137.0
>> >>>> #> Vcells 3752538 28.7 7930384 60.6 7930356 60.6
>> >>>>
>> >>>> More precisely, you probably want to represent each connection as a
>> >>>> separate environment, with its own finalizer. Hope this helps,
>> >>>> Gabor
>> >>>>
>> >>>> On Sun, Oct 26, 2014 at 9:49 PM, Murat Tasan <mmuurr at gmail.com>
>> >>>> wrote:
>> >>>>> Hi all, I have a question about finalizers...
>> >>>>> I have a package that manages state for a few connections, and I'd
>> >>>>> like to ensure that these connections are 'cleanly' closed upon
>> >>>>> either
>> >>>>> (i) R quitting or (ii) an unloading of the package.
>> >>>>> So, in a pared-down example package with a single R file, it looks
>> >>>>> something like:
>> >>>>>
>> >>>>> ##### BEGIN PACKAGE CODE #####
>> >>>>> .CONNS <- new.env(parent = emptyenv())
>> >>>>> .CONNS$resource1 <- NULL
>> >>>>> .CONNS$resource2 <- NULL
>> >>>>> ## some more .CONNS resources...
>> >>>>>
>> >>>>> reg.finalizer(.CONNS, function(x) sapply(names(x), disconnect),
>> >>>>> onexit = TRUE)
>> >>>>>
>> >>>>> connect <- function(x) {
>> >>>>> ## here lies code to connect and update .CONNS[[x]]
>> >>>>> }
>> >>>>> disconnect <- function(x) {
>> >>>>> print(sprintf("disconnect(%s)", x))
>> >>>>> ## here lies code to disconnect and update .CONNS[[x]]
>> >>>>> }
>> >>>>> ##### END PACKAGE CODE #####
>> >>>>>
>> >>>>> The print(...) statement in disconnect(...) is there as a trace, as
>> >>>>> I
>> >>>>> hoped that I'd see disconnect(...) being called when I quit (or
>> >>>>> detach(..., unload = TRUE)).
>> >>>>> But, it doesn't appear that disconnect(...) is ever called when the
>> >>>>> package (and .CONNS) falls out of memory/scope (and I ran gc() after
>> >>>>> detach(...), just to be sure).
>> >>>>>
>> >>>>> In a second 'shot-in-the-dark' attempt, I placed the reg.finalizer
>> >>>>> call inside an .onLoad function, but that didn't seem to work,
>> >>>>> either.
>> >>>>>
>> >>>>> I'm guessing my use of reg.finalizer is way off-base here... but I
>> >>>>> cannot infer from the reg.finalizer man page what I might be doing
>> >>>>> wrong.
>> >>>>> Is there a way to see, at the R-system level, what functions have
>> >>>>> been
>> >>>>> registered as finalizers?
>> >>>>>
>> >>>>> Thanks for any pointers!
>> >>>>>
>> >>>>> -Murat
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> R-devel at r-project.org mailing list
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>>
>> >>> ______________________________________________
>> >>> R-devel at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list