[R-sig-Geo] GDAL.close

Roger Bivand Roger.Bivand at nhh.no
Tue Nov 5 14:56:59 CET 2013


On Tue, 5 Nov 2013, Oliver Soong wrote:

> Hi Roger,
>
> Toggling on GDALTransientDataset in GDAL.close doesn't change anything
> because RGDAL_CloseDataset already toggles on that.  In either case,
> RGDAL_DeleteHandle still doesn't work.  Also, the gc at the end doesn't do
> anything useful because .setCollectorFun has already trivialized the
> finalizer and because dataset still exists at that point, as does the
> source dataset given to GDAL.close in the first place.
>
> The HFA driver has the exact same issue.  From what I see in the code, I
> don't think it's related to the specific driver.

Here is the next version:

http://win-builder.r-project.org/0X0318s8iW0C

with:

             .setCollectorFun(slot(dataset, 'handle'), NULL)
             .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
             .Call("RGDAL_CloseHandle", slot(dataset, 'handle'),
                 PACKAGE="rgdal")

Running under Windows, and using:

http://download.sysinternals.com/files/Handle.zip

to check, the three transient datasets are open:

   3B8: File  (RW-)   C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\awlr2.tif
   3C0: File  (RW-)   C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\tfur1.tif
   498: File  (RW-)   C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\mxbr3.tif

I'm assuming that RW- means read and write open, the third character is D, 
which is probably directory.

Roger

>
> Cheers,
> Oliver
>
>
>
>
> On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com> wrote:
>
>> Er, just saw your recent e-mail.  I'll take a look.
>>
>> Oliver
>>
>>
>> On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com> wrote:
>>
>>> I do think we've gotten a bit muddled.  I'm probably not helping, but
>>> I'll do my best.
>>>
>>> Windows XP (32-bit), 7 (32-bit), and 2008 R2 (32-bit and 64-bit),
>>> R-3.0.2, sp 1.0.13, rgdal 0.8.11, raster 2.1.49.
>>>
>>> The main problem as seen by end-users is the orphaned temporary files
>>> that you observed were left over for r1 and r2.  They cannot be removed
>>> while R is running.  I seem not to have explained r3 very well, but suffice
>>> it to say that GDAL.close(r3) creates similar orphaned temporary files,
>>> indicating GDAL.close is not functioning properly on GDALTransientDataset
>>> objects under Windows.
>>>
>>> I think RGDAL_DeleteHandle (and hence RGDAL_CloseDataset) is the root of
>>> the problem, and I think it's not properly closing the file handle before
>>> trying and failing to delete the associated files.  Windows automatically
>>> locks open file handles, but linux requires extra steps that are not always
>>> done and are not always respected, which is probably why this isn't
>>> apparent under linux.  The finalizer code and resetting seems appropriate.
>>>
>>> I still hesitate to say much about RGDAL_DeleteHandle, but I will point
>>> out that one is normally supposed to close the file handle before deleting
>>> the file, and it seems backwards in RGDAL_DeleteHandle.  I don't know if
>>> that is intentional.
>>>
>>> After thinking a little more, I think it's better to switch the calls to
>>> RGDAL_CloseHandle and RGDAL_CloseDataset that I suggested originally for
>>> GDAL.close.  That means simply adding the call to RGDAL_CloseHandle after
>>> the call to RGDAL_CloseDataset, rather than before.  With this code, if
>>> RGDAL_CloseDataset behaves as intended, RGDAL_CloseHandle will get a nil
>>> pointer and won't do anything.  However, if RGDAL_CloseDataset fails to
>>> function properly, RGDAL_CloseHandle will close the open handle and the
>>> if(isTrans) cleanup code already in GDAL.close will operate.
>>>
>>> Perhaps I could be more helpful if you explained what you thought my
>>> suggested change might break?  This last one (the existing
>>> RGDAL_CloseDataset followed by an additional RGDAL_CloseHandle) should be
>>> no worse than the current code.
>>>
>>> Is that at all clearer?
>>>
>>> Oliver
>>>
>>>
>>> On Mon, Nov 4, 2013 at 1:03 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>>>
>>>> Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run as
>>>> administrator or not? I agree that the code in those parts of rgdal is not
>>>> well-designed - it was well-designed, but has been modified so that it
>>>> works for most people cross-platform, and has had to accommodate changes
>>>> that have taken place in GDAL over more than 10 years, not least the
>>>> error-handler.
>>>>
>>>> The simple solution to your practical problem is to for you to use a
>>>> larger temporary drive under Windows, or change to an operating system that
>>>> does not have these side-effects.
>>>>
>>>> Assisting you is not just a matter of doing what you think works for
>>>> you, but making sure it doesn't break anything else for anybody else
>>>> cross-platform.
>>>>
>>>> Your script does not check for other files in tempdir, so I prepended a
>>>> listing of prior content:
>>>>
>>>> pc <- dir(tempdir())
>>>>
>>>> and dropped them from the list for unlinking:
>>>>
>>>> now <- dir(tempdir())
>>>> unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))
>>>>
>>>> I do not see how your script exercises the problem. It creates a new
>>>> transient file, but does not close it, which was the behaviour you are
>>>> unhappy with. If I add
>>>>
>>>> GDAL.close(r3)
>>>>
>>>> on Linux, the transient dataset is removed. On Windows 7 64-bit with the
>>>> CRAN rgdal binary run as user, temporary files are left in tempdir for r1,
>>>> r2, and r3. The same three temporary files are left when run as
>>>> administrator.
>>>>
>>>> The earliest version of GDAL.close was:
>>>>
>>>> GDAL.close <- function(dataset) {
>>>>             .setCollectorFun(slot(dataset, 'handle'), NULL)
>>>>             .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>>>>             invisible()
>>>> }
>>>>
>>>> with a version in 2007 in the THK branch calling a closeDataset method,
>>>> containing:
>>>>
>>>>             handle <- slot(dataset, "handle")
>>>>             unreg.finalizer(handle)
>>>>             .Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")
>>>>
>>>> with:
>>>>
>>>> unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)
>>>>
>>>> and by 2010 was:
>>>>
>>>> GDAL.close <- function(dataset) {
>>>>             .setCollectorFun(slot(dataset, 'handle'), NULL)
>>>>             .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>>>>             invisible(gc())
>>>> }
>>>>
>>>> Special handling of GDALTransientDataset was added in revision 433 in
>>>> Janual 2013, and modified in revision 462 in April 2013.
>>>>
>>>> It has seemed IIRC that Windows can treat arbitrary files as open. It is
>>>> also possible that there is an interaction between Windows and
>>>> rgdal:::.setCollectorFun(), which does what it should, when given the NULL
>>>> argument, setting:
>>>>
>>>> .setCollectorFun <- function(object, fun) {
>>>>
>>>>   if (is.null(fun)) fun <- function(obj) obj
>>>>   reg.finalizer(object, fun, onexit=TRUE)
>>>>
>>>> }
>>>>
>>>> so incorporating the THK branch logic. It could possibly also vary
>>>> across drivers, so finding a robust fix means setting up a test rig with
>>>> multiple Windows machines and testing for multiple drivers to see why some
>>>> temprary files are being treated as open when other operating systems don't
>>>> have problems in their removal. Windows users with too small temporary
>>>> directories. I welcome contributions from people who understand Windows and
>>>> can actually explain why we see the consequences we see.
>>>>
>>>> One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle,
>>>> PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back once
>>>> the package has gone through win-builder.
>>>>
>>>> Hope this doesn't muddle too much, clarification doesn't seem like the
>>>> right expression.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>> On Sun, 3 Nov 2013, Oliver Soong wrote:
>>>>
>>>>  I've been using the CRAN rgdal and raster.  I apologize in advance for
>>>>> all
>>>>> the linebreaks that will be broken.  This code should highlight the
>>>>> problem
>>>>> and the fix:
>>>>>
>>>>>
>>>>>
>>>>> require(rgdal)
>>>>> require(raster)
>>>>> r1 <- raster(system.file("external/test.grd", package="raster"))
>>>>> r2 <- as(r1, "SpatialGridDataFrame")
>>>>> r2.dims <- gridparameters(r2)$cells.dim
>>>>> r3 <- new("GDALTransientDataset", driver = new("GDALDriver", "GTiff"),
>>>>> rows
>>>>> = r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32", options =
>>>>> NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
>>>>> print(dir(tempdir()))
>>>>> writeRaster(r1, file.path(tempdir(), "r1.tif"))
>>>>> writeGDAL(r2, file.path(tempdir(), "r2.tif"))
>>>>> print(dir(tempdir()))
>>>>> unlink(dir(tempdir(), full.names = TRUE))
>>>>> print(dir(tempdir()))
>>>>> leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
>>>>> invisible(lapply(paste("cmd /c del", leftover), system))
>>>>> rm(r1, r2, r3)
>>>>> gc()
>>>>> unlink(dir(tempdir(), full.names = TRUE))
>>>>> print(dir(tempdir()))
>>>>> invisible(lapply(paste("cmd /c del", leftover), system))
>>>>>
>>>>>
>>>>>
>>>>> Basically, I'm trying to write a standard raster package raster (r1)
>>>>> and an
>>>>> sp package SpatialGridDataFrame (r2).  Both of those end up calling
>>>>> new("GDALTransientDataset"), hence r3.  At the first
>>>>> print(dir(tempdir())),
>>>>> only r3 has an open temporary file, which is expected.  At the second,
>>>>> all
>>>>> three have open temporary files, and r1 and r2 have their written final
>>>>> outputs, which are closed.  The temporary files for r1 and r2 should
>>>>> have
>>>>> been closed at this point.  None of the temporary files can be removed
>>>>> by
>>>>> unlink, although the final outputs can, as shown at the third
>>>>> print(dir(tempdir())).  Windows can't remove them, either.  However, if
>>>>> I
>>>>> remove the GDALTransientDataset r3 and initiate gc(), R can remove that
>>>>> temporary file, but this does not work for r1 and r2.  After q(), the
>>>>> tempdir() will not be removed by R, but it and the temporary files for
>>>>> r1
>>>>> and r2 can now be removed.
>>>>>
>>>>> It looks like GDAL.close is broken (again/as always), but the collector
>>>>> function for GDALTransientDataset seems to at least close the handle.
>>>>> GDAL.close relies on RGDAL_CloseDataset, whereas the
>>>>> GDALTransientDataset
>>>>> collector just uses RGDAL_CloseHandle.  With the handle closed, I think
>>>>> the
>>>>> unlink code in GDAL.close will work (as an aside, I'd use the pattern
>>>>> paste0("^[a-z]{3}", basen, "$") to be safer and the argument full.names
>>>>> might be simpler than constructing flf separately).  I believe
>>>>> RGDAL_CloseDataset checks for NULL handles but just returns early, so it
>>>>> should be the same to replace the .Call("RGDAL_CloseDataset", ...) with
>>>>> .Call("RGDAL_CloseHandle", ...).
>>>>>
>>>>> Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't know
>>>>> enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile
>>>>> business
>>>>> or why RGDAL_CloseHandle is commented out to make any useful suggestions
>>>>> there.
>>>>>
>>>>> Cheers,
>>>>> Oliver
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>> wrote:
>>>>>
>>>>>  On Mon, 28 Oct 2013, Oliver Soong wrote:
>>>>>>
>>>>>>  I've had a long standing struggle with GDAL.close on Windows, and I
>>>>>>
>>>>>>> think I might finally have found a fix.  I'm currently running rgdal
>>>>>>> 0.8.11, R 3.0.2, and 32-bit Windows 7.
>>>>>>>
>>>>>>> Currently, writeRaster and writeGDAL create temporary files in the
>>>>>>> tempdir() folder (the final filename prefixed with 3 random [a-z]
>>>>>>> letters).  On my system, these files get left open and orphaned.  When
>>>>>>> doing heavy processing, this can lead to the drive hosting the
>>>>>>> tempdir() folder to become full, even if the data is being ultimately
>>>>>>> written to a much larger drive.  This also means that R cannot clean
>>>>>>> up these files or the tempdir() folder when it closes, causing similar
>>>>>>> bloat in my %TEMP%.
>>>>>>>
>>>>>>> I haven't tested this on other platforms, but I think it might help to
>>>>>>> insert an extra line into GDAL.close:
>>>>>>>
>>>>>>> .setCollectorFun(slot(dataset, "handle"), NULL)
>>>>>>> .Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
>>>>>>> .Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
>>>>>>>
>>>>>>> For whatever reason, RGDAL_CloseDataset doesn't seem to actually close
>>>>>>> the C file handle, but it doesn't seem to mind if the file handle was
>>>>>>> closed beforehand.
>>>>>>>
>>>>>>>
>>>>>> Could you please provide a working example? I have looked at this, but
>>>>>> need a baseline to know whether I'm looking at the same thing. I'm very
>>>>>> unsure that this is a robust solution, and need an instrumented
>>>>>> example,
>>>>>> including listings of the temporary directory during the process, to
>>>>>> see
>>>>>> the consequences. Thanks for looking into this, but I'd prefer to be
>>>>>> sure
>>>>>> that a Windows-specific fix doesn't make things worse for others too.
>>>>>> Please also report on the source of your Windows rgdal binary - is it
>>>>>> from
>>>>>> CRAN or locally built dynamically linking your own GDAL?
>>>>>>
>>>>>> Best wishes,
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>
>>>>>>  Cheers,
>>>>>>> Oliver
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> R-sig-Geo at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>> Roger Bivand
>>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, NHH Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>> e-mail: Roger.Bivand at nhh.no
>>>>
>>>>
>>>
>>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list