[R-sig-Geo] GDAL.close
Roger Bivand
Roger.Bivand at nhh.no
Tue Nov 5 20:50:45 CET 2013
On Tue, 5 Nov 2013, Oliver Soong wrote:
> Hi Roger,
>
> I forgot that RGDAL_CloseDataset nils dataset at handle, which means the
> subsequent RGDAL_CloseHandle can't find the handle to close. It also means
> that GDAL.close should replace RGDAL_CloseDataset with RGDAL_CloseHandle.
> On linux, this will cause temporary datasets to be removed by the isTrans
> unlink code instead of GDALDeleteDataset, but it should make things work on
> Windows. I don't know if there are drivers that might create files that
> don't match the unlink pattern.
>
> I do think it would be better to ultimately fix RGDAL_DeleteHandle. Under
> normal circumstances, I would call RGDAL_CloseHandle (or equivalently call
> GDALClose and R_ClearExternalPtr directly) and then GDALDeleteDataset. I'm
> still uncertain why the ordering is reversed or what the (commented out)
> #ifndef OSGEO4W deleteFile business is about.
Thanks to Even Rouault on gdal-dev, I think that:
http://win-builder.r-project.org/jJIzeDT1C100
contains a fixed version, with RGDAL_DeleteHandle now closing the dataset
before deleting it. This worked on Linux and Windows in a rough cut, and
this one after removing the unneeded unlink() call in GDAL.close() now
works on Linux, and should also work on Windows. Behaviour on Linux and
Windows with respect to transient datasets being deleted is now the same.
It was helpful that you stayed with this, it was far from simple, as Even
explained on the gdal-dev thread.
Roger
>
> Cheers,
> Oliver
>
>
>
>
> On Tue, Nov 5, 2013 at 5:56 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Tue, 5 Nov 2013, Oliver Soong wrote:
>>
>> Hi Roger,
>>>
>>> Toggling on GDALTransientDataset in GDAL.close doesn't change anything
>>> because RGDAL_CloseDataset already toggles on that. In either case,
>>> RGDAL_DeleteHandle still doesn't work. Also, the gc at the end doesn't do
>>> anything useful because .setCollectorFun has already trivialized the
>>> finalizer and because dataset still exists at that point, as does the
>>> source dataset given to GDAL.close in the first place.
>>>
>>> The HFA driver has the exact same issue. From what I see in the code, I
>>> don't think it's related to the specific driver.
>>>
>>
>> Here is the next version:
>>
>> http://win-builder.r-project.org/0X0318s8iW0C
>>
>> with:
>>
>>
>> .setCollectorFun(slot(dataset, 'handle'), NULL)
>> .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>> .Call("RGDAL_CloseHandle", slot(dataset, 'handle'),
>> PACKAGE="rgdal")
>>
>> Running under Windows, and using:
>>
>> http://download.sysinternals.com/files/Handle.zip
>>
>> to check, the three transient datasets are open:
>>
>> 3B8: File (RW-) C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\awlr2.tif
>> 3C0: File (RW-) C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\tfur1.tif
>> 498: File (RW-) C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\mxbr3.tif
>>
>> I'm assuming that RW- means read and write open, the third character is D,
>> which is probably directory.
>>
>> Roger
>>
>>
>>
>>> Cheers,
>>> Oliver
>>>
>>>
>>>
>>>
>>> On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com> wrote:
>>>
>>> Er, just saw your recent e-mail. I'll take a look.
>>>>
>>>> Oliver
>>>>
>>>>
>>>> On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com> wrote:
>>>>
>>>> I do think we've gotten a bit muddled. I'm probably not helping, but
>>>>> I'll do my best.
>>>>>
>>>>> Windows XP (32-bit), 7 (32-bit), and 2008 R2 (32-bit and 64-bit),
>>>>> R-3.0.2, sp 1.0.13, rgdal 0.8.11, raster 2.1.49.
>>>>>
>>>>> The main problem as seen by end-users is the orphaned temporary files
>>>>> that you observed were left over for r1 and r2. They cannot be removed
>>>>> while R is running. I seem not to have explained r3 very well, but
>>>>> suffice
>>>>> it to say that GDAL.close(r3) creates similar orphaned temporary files,
>>>>> indicating GDAL.close is not functioning properly on
>>>>> GDALTransientDataset
>>>>> objects under Windows.
>>>>>
>>>>> I think RGDAL_DeleteHandle (and hence RGDAL_CloseDataset) is the root of
>>>>> the problem, and I think it's not properly closing the file handle
>>>>> before
>>>>> trying and failing to delete the associated files. Windows
>>>>> automatically
>>>>> locks open file handles, but linux requires extra steps that are not
>>>>> always
>>>>> done and are not always respected, which is probably why this isn't
>>>>> apparent under linux. The finalizer code and resetting seems
>>>>> appropriate.
>>>>>
>>>>> I still hesitate to say much about RGDAL_DeleteHandle, but I will point
>>>>> out that one is normally supposed to close the file handle before
>>>>> deleting
>>>>> the file, and it seems backwards in RGDAL_DeleteHandle. I don't know if
>>>>> that is intentional.
>>>>>
>>>>> After thinking a little more, I think it's better to switch the calls to
>>>>> RGDAL_CloseHandle and RGDAL_CloseDataset that I suggested originally for
>>>>> GDAL.close. That means simply adding the call to RGDAL_CloseHandle
>>>>> after
>>>>> the call to RGDAL_CloseDataset, rather than before. With this code, if
>>>>> RGDAL_CloseDataset behaves as intended, RGDAL_CloseHandle will get a nil
>>>>> pointer and won't do anything. However, if RGDAL_CloseDataset fails to
>>>>> function properly, RGDAL_CloseHandle will close the open handle and the
>>>>> if(isTrans) cleanup code already in GDAL.close will operate.
>>>>>
>>>>> Perhaps I could be more helpful if you explained what you thought my
>>>>> suggested change might break? This last one (the existing
>>>>> RGDAL_CloseDataset followed by an additional RGDAL_CloseHandle) should
>>>>> be
>>>>> no worse than the current code.
>>>>>
>>>>> Is that at all clearer?
>>>>>
>>>>> Oliver
>>>>>
>>>>>
>>>>> On Mon, Nov 4, 2013 at 1:03 PM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>> wrote:
>>>>>
>>>>> Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run
>>>>>> as
>>>>>> administrator or not? I agree that the code in those parts of rgdal is
>>>>>> not
>>>>>> well-designed - it was well-designed, but has been modified so that it
>>>>>> works for most people cross-platform, and has had to accommodate
>>>>>> changes
>>>>>> that have taken place in GDAL over more than 10 years, not least the
>>>>>> error-handler.
>>>>>>
>>>>>> The simple solution to your practical problem is to for you to use a
>>>>>> larger temporary drive under Windows, or change to an operating system
>>>>>> that
>>>>>> does not have these side-effects.
>>>>>>
>>>>>> Assisting you is not just a matter of doing what you think works for
>>>>>> you, but making sure it doesn't break anything else for anybody else
>>>>>> cross-platform.
>>>>>>
>>>>>> Your script does not check for other files in tempdir, so I prepended a
>>>>>> listing of prior content:
>>>>>>
>>>>>> pc <- dir(tempdir())
>>>>>>
>>>>>> and dropped them from the list for unlinking:
>>>>>>
>>>>>> now <- dir(tempdir())
>>>>>> unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))
>>>>>>
>>>>>> I do not see how your script exercises the problem. It creates a new
>>>>>> transient file, but does not close it, which was the behaviour you are
>>>>>> unhappy with. If I add
>>>>>>
>>>>>> GDAL.close(r3)
>>>>>>
>>>>>> on Linux, the transient dataset is removed. On Windows 7 64-bit with
>>>>>> the
>>>>>> CRAN rgdal binary run as user, temporary files are left in tempdir for
>>>>>> r1,
>>>>>> r2, and r3. The same three temporary files are left when run as
>>>>>> administrator.
>>>>>>
>>>>>> The earliest version of GDAL.close was:
>>>>>>
>>>>>> GDAL.close <- function(dataset) {
>>>>>> .setCollectorFun(slot(dataset, 'handle'), NULL)
>>>>>> .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>>>>>> invisible()
>>>>>> }
>>>>>>
>>>>>> with a version in 2007 in the THK branch calling a closeDataset method,
>>>>>> containing:
>>>>>>
>>>>>> handle <- slot(dataset, "handle")
>>>>>> unreg.finalizer(handle)
>>>>>> .Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")
>>>>>>
>>>>>> with:
>>>>>>
>>>>>> unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)
>>>>>>
>>>>>> and by 2010 was:
>>>>>>
>>>>>> GDAL.close <- function(dataset) {
>>>>>> .setCollectorFun(slot(dataset, 'handle'), NULL)
>>>>>> .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>>>>>> invisible(gc())
>>>>>> }
>>>>>>
>>>>>> Special handling of GDALTransientDataset was added in revision 433 in
>>>>>> Janual 2013, and modified in revision 462 in April 2013.
>>>>>>
>>>>>> It has seemed IIRC that Windows can treat arbitrary files as open. It
>>>>>> is
>>>>>> also possible that there is an interaction between Windows and
>>>>>> rgdal:::.setCollectorFun(), which does what it should, when given the
>>>>>> NULL
>>>>>> argument, setting:
>>>>>>
>>>>>> .setCollectorFun <- function(object, fun) {
>>>>>>
>>>>>> if (is.null(fun)) fun <- function(obj) obj
>>>>>> reg.finalizer(object, fun, onexit=TRUE)
>>>>>>
>>>>>> }
>>>>>>
>>>>>> so incorporating the THK branch logic. It could possibly also vary
>>>>>> across drivers, so finding a robust fix means setting up a test rig
>>>>>> with
>>>>>> multiple Windows machines and testing for multiple drivers to see why
>>>>>> some
>>>>>> temprary files are being treated as open when other operating systems
>>>>>> don't
>>>>>> have problems in their removal. Windows users with too small temporary
>>>>>> directories. I welcome contributions from people who understand
>>>>>> Windows and
>>>>>> can actually explain why we see the consequences we see.
>>>>>>
>>>>>> One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle,
>>>>>> PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back
>>>>>> once
>>>>>> the package has gone through win-builder.
>>>>>>
>>>>>> Hope this doesn't muddle too much, clarification doesn't seem like the
>>>>>> right expression.
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, 3 Nov 2013, Oliver Soong wrote:
>>>>>>
>>>>>> I've been using the CRAN rgdal and raster. I apologize in advance for
>>>>>>
>>>>>>> all
>>>>>>> the linebreaks that will be broken. This code should highlight the
>>>>>>> problem
>>>>>>> and the fix:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> require(rgdal)
>>>>>>> require(raster)
>>>>>>> r1 <- raster(system.file("external/test.grd", package="raster"))
>>>>>>> r2 <- as(r1, "SpatialGridDataFrame")
>>>>>>> r2.dims <- gridparameters(r2)$cells.dim
>>>>>>> r3 <- new("GDALTransientDataset", driver = new("GDALDriver", "GTiff"),
>>>>>>> rows
>>>>>>> = r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32", options
>>>>>>> =
>>>>>>> NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
>>>>>>> print(dir(tempdir()))
>>>>>>> writeRaster(r1, file.path(tempdir(), "r1.tif"))
>>>>>>> writeGDAL(r2, file.path(tempdir(), "r2.tif"))
>>>>>>> print(dir(tempdir()))
>>>>>>> unlink(dir(tempdir(), full.names = TRUE))
>>>>>>> print(dir(tempdir()))
>>>>>>> leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
>>>>>>> invisible(lapply(paste("cmd /c del", leftover), system))
>>>>>>> rm(r1, r2, r3)
>>>>>>> gc()
>>>>>>> unlink(dir(tempdir(), full.names = TRUE))
>>>>>>> print(dir(tempdir()))
>>>>>>> invisible(lapply(paste("cmd /c del", leftover), system))
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Basically, I'm trying to write a standard raster package raster (r1)
>>>>>>> and an
>>>>>>> sp package SpatialGridDataFrame (r2). Both of those end up calling
>>>>>>> new("GDALTransientDataset"), hence r3. At the first
>>>>>>> print(dir(tempdir())),
>>>>>>> only r3 has an open temporary file, which is expected. At the second,
>>>>>>> all
>>>>>>> three have open temporary files, and r1 and r2 have their written
>>>>>>> final
>>>>>>> outputs, which are closed. The temporary files for r1 and r2 should
>>>>>>> have
>>>>>>> been closed at this point. None of the temporary files can be removed
>>>>>>> by
>>>>>>> unlink, although the final outputs can, as shown at the third
>>>>>>> print(dir(tempdir())). Windows can't remove them, either. However,
>>>>>>> if
>>>>>>> I
>>>>>>> remove the GDALTransientDataset r3 and initiate gc(), R can remove
>>>>>>> that
>>>>>>> temporary file, but this does not work for r1 and r2. After q(), the
>>>>>>> tempdir() will not be removed by R, but it and the temporary files for
>>>>>>> r1
>>>>>>> and r2 can now be removed.
>>>>>>>
>>>>>>> It looks like GDAL.close is broken (again/as always), but the
>>>>>>> collector
>>>>>>> function for GDALTransientDataset seems to at least close the handle.
>>>>>>> GDAL.close relies on RGDAL_CloseDataset, whereas the
>>>>>>> GDALTransientDataset
>>>>>>> collector just uses RGDAL_CloseHandle. With the handle closed, I
>>>>>>> think
>>>>>>> the
>>>>>>> unlink code in GDAL.close will work (as an aside, I'd use the pattern
>>>>>>> paste0("^[a-z]{3}", basen, "$") to be safer and the argument
>>>>>>> full.names
>>>>>>> might be simpler than constructing flf separately). I believe
>>>>>>> RGDAL_CloseDataset checks for NULL handles but just returns early, so
>>>>>>> it
>>>>>>> should be the same to replace the .Call("RGDAL_CloseDataset", ...)
>>>>>>> with
>>>>>>> .Call("RGDAL_CloseHandle", ...).
>>>>>>>
>>>>>>> Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't know
>>>>>>> enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile
>>>>>>> business
>>>>>>> or why RGDAL_CloseHandle is commented out to make any useful
>>>>>>> suggestions
>>>>>>> there.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Oliver
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, 28 Oct 2013, Oliver Soong wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I've had a long standing struggle with GDAL.close on Windows, and I
>>>>>>>>
>>>>>>>> think I might finally have found a fix. I'm currently running rgdal
>>>>>>>>> 0.8.11, R 3.0.2, and 32-bit Windows 7.
>>>>>>>>>
>>>>>>>>> Currently, writeRaster and writeGDAL create temporary files in the
>>>>>>>>> tempdir() folder (the final filename prefixed with 3 random [a-z]
>>>>>>>>> letters). On my system, these files get left open and orphaned.
>>>>>>>>> When
>>>>>>>>> doing heavy processing, this can lead to the drive hosting the
>>>>>>>>> tempdir() folder to become full, even if the data is being
>>>>>>>>> ultimately
>>>>>>>>> written to a much larger drive. This also means that R cannot clean
>>>>>>>>> up these files or the tempdir() folder when it closes, causing
>>>>>>>>> similar
>>>>>>>>> bloat in my %TEMP%.
>>>>>>>>>
>>>>>>>>> I haven't tested this on other platforms, but I think it might help
>>>>>>>>> to
>>>>>>>>> insert an extra line into GDAL.close:
>>>>>>>>>
>>>>>>>>> .setCollectorFun(slot(dataset, "handle"), NULL)
>>>>>>>>> .Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
>>>>>>>>> .Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
>>>>>>>>>
>>>>>>>>> For whatever reason, RGDAL_CloseDataset doesn't seem to actually
>>>>>>>>> close
>>>>>>>>> the C file handle, but it doesn't seem to mind if the file handle
>>>>>>>>> was
>>>>>>>>> closed beforehand.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you please provide a working example? I have looked at this,
>>>>>>>> but
>>>>>>>> need a baseline to know whether I'm looking at the same thing. I'm
>>>>>>>> very
>>>>>>>> unsure that this is a robust solution, and need an instrumented
>>>>>>>> example,
>>>>>>>> including listings of the temporary directory during the process, to
>>>>>>>> see
>>>>>>>> the consequences. Thanks for looking into this, but I'd prefer to be
>>>>>>>> sure
>>>>>>>> that a Windows-specific fix doesn't make things worse for others too.
>>>>>>>> Please also report on the source of your Windows rgdal binary - is it
>>>>>>>> from
>>>>>>>> CRAN or locally built dynamically linking your own GDAL?
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>>
>>>>>>>> Roger
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>> Oliver
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>> Roger Bivand
>>>>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> Roger Bivand
>>>>>> Department of Economics, NHH Norwegian School of Economics,
>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, NHH Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list