[Rd] Fixing the HDF5 package: the on.exit mystery
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Mar 4 22:12:20 CET 2011
Looks like you should be using finalizers instead. See the RODBC
package for an example of this.
On Fri, 4 Mar 2011, H C Pumphrey wrote:
> Dear all,
>
> I'm trying to fix a subtle bug in the hdf5 package. This package provides an
> interfaces to the HDF5 library and hence allows one to load data into R from
> files in the HDF5 format. The bug appeared during a period in which R changed
> but the package did not.
>
> I include below both the R and C code, stripped of everything except what is
> needed to show the bug. What is supposed to happen is
>
> (*) the user calls R function hdf5load()
> (*) hdf5load() calls C function do_hdf5load()
> (*) do_hdf5load() opens the HDF5 file recording its HDF5 file id (fid)
> (*) do_hdf5load() calls C function setup_onexit, passing fid to it
> (*) setup_onexit sets up the on.exit call to be R function hdf5cleanup with
> fid as its argument
> (*) C function do_hdf5load() walks the HDF5 file's tree structure of groups
> of groups of [...] of datasets, mapping them to an R list of lists of [...]
> of array variables. This recursive procedure may have a variety of exit
> points buried inside itself.
> (*) C function do_hdf5load() exits for some reason. R function hdf5load()
> therefore exits but before doing so it calls its on.exit code (which is
> hdf5cleanup(fid) with the right value of fid), closing the file.
>
> The problem is that when do_hdf5load() and hdf5load() exit, hdf5cleanup() is
> usually not called, meaning that the file is left open. You might not notice
> this, but if you are processing a few year's worth of data, which is stored
> at 1 file per day, you may end up with the system limit number of files open
> and be unable to open any more.
>
> I have a suspicion that the problem dates to a change in R at 2.8.0. If you
> do help(on.exit) it notes under "Details" that: "Where ‘expr’ was evaluated
> changed in R 2.8.0 ..." But it is not clear how I should modify the C code to
> force hdf5cleanup() to be reliably called when do_hdf5load() exits.
> Any help appreciated.
>
> Hugh (possibly the nearest thing to a maintainer that the hdf5 package
> currently has)
>
> (R and C code follow)
>
> #----------------------------------------------------------------
> "hdf5load" <- function (file, load = TRUE, verbosity = 0, tidy = FALSE)
> {
> call <- sys.call()
> .External("do_hdf5load", call, sys.frame(sys.parent()), file, load,
> as.integer (verbosity), as.logical(tidy),
> PACKAGE="hdf5")
> }
>
> "hdf5cleanup" <- function (fid)
> {
> call <- sys.call()
> print("In hdf5cleanup: calling do_hdf5cleanup")
> invisible(.External("do_hdf5cleanup", call, sys.frame(sys.parent()), fid,
> PACKAGE="hdf5"))
> }
> #----------------------------------------------------------------
>
>
> /*---------------------------------------------------------------*/
> SEXP do_hdf5load (SEXP args)
> {
> /* Code to process args snipped */
> if ((fid = H5Fopen (path, H5F_ACC_RDONLY, H5P_DEFAULT)) < 0)
> errorcall (call, "unable to open HDF file: %s", path);
>
> setup_onexit (fid, env);
> /* Messy code to walk tree structure of file snipped */
> }
>
> /* The following function shown in its entirety */
> setup_onexit (hid_t fid, SEXP env)
> {
> eval (lang2 (install ("on.exit"),
> lang2 (install ("hdf5cleanup"),
> ScalarInteger (fid))),
> env);
> }
>
> SEXP
> do_hdf5cleanup (SEXP args)
> {
> /* Code to process args snipped */
> /* various cleanup things done including this: */
> H5Fclose(fid)
> }
> /*---------------------------------------------------------------*/
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list