[R-sig-Geo] GDAL.close

Roger Bivand Roger.Bivand at nhh.no
Mon Nov 4 23:17:40 CET 2013


On Mon, 4 Nov 2013, Roger Bivand wrote:

> Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run as 
> administrator or not? I agree that the code in those parts of rgdal is not 
> well-designed - it was well-designed, but has been modified so that it works 
> for most people cross-platform, and has had to accommodate changes that have 
> taken place in GDAL over more than 10 years, not least the error-handler.
>
> The simple solution to your practical problem is to for you to use a larger 
> temporary drive under Windows, or change to an operating system that does not 
> have these side-effects.
>
> Assisting you is not just a matter of doing what you think works for you, but 
> making sure it doesn't break anything else for anybody else cross-platform.
>
> Your script does not check for other files in tempdir, so I prepended a 
> listing of prior content:
>
> pc <- dir(tempdir())
>
> and dropped them from the list for unlinking:
>
> now <- dir(tempdir())
> unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))
>
> I do not see how your script exercises the problem. It creates a new 
> transient file, but does not close it, which was the behaviour you are 
> unhappy with. If I add
>
> GDAL.close(r3)
>
> on Linux, the transient dataset is removed. On Windows 7 64-bit with the CRAN 
> rgdal binary run as user, temporary files are left in tempdir for r1, r2, and 
> r3. The same three temporary files are left when run as administrator.
>
> The earliest version of GDAL.close was:
>
> GDAL.close <- function(dataset) {
>            .setCollectorFun(slot(dataset, 'handle'), NULL)
>            .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>            invisible()
> }
>
> with a version in 2007 in the THK branch calling a closeDataset method, 
> containing:
>
>            handle <- slot(dataset, "handle")
>            unreg.finalizer(handle)
>            .Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")
>
> with:
>
> unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)
>
> and by 2010 was:
>
> GDAL.close <- function(dataset) {
>            .setCollectorFun(slot(dataset, 'handle'), NULL)
>            .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
>            invisible(gc())
> }
>
> Special handling of GDALTransientDataset was added in revision 433 in Janual 
> 2013, and modified in revision 462 in April 2013.
>
> It has seemed IIRC that Windows can treat arbitrary files as open. It is also 
> possible that there is an interaction between Windows and 
> rgdal:::.setCollectorFun(), which does what it should, when given the NULL 
> argument, setting:
>
> .setCollectorFun <- function(object, fun) {
>
>  if (is.null(fun)) fun <- function(obj) obj
>  reg.finalizer(object, fun, onexit=TRUE)
>
> }
>
> so incorporating the THK branch logic. It could possibly also vary across 
> drivers, so finding a robust fix means setting up a test rig with multiple 
> Windows machines and testing for multiple drivers to see why some temprary 
> files are being treated as open when other operating systems don't have 
> problems in their removal. Windows users with too small temporary 
> directories. I welcome contributions from people who understand Windows and 
> can actually explain why we see the consequences we see.
>
> One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle, 
> PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back once the 
> package has gone through win-builder.

The Windows binary with this modification is at:

http://win-builder.r-project.org/TO95cIM24UVL

but I do not see that it has altered behaviour under Windows 7 running as 
user. For some reason Windows sees the transient files as open. Please try 
other drivers to ensure that this isn't driver-specific.

Roger



>
> Hope this doesn't muddle too much, clarification doesn't seem like the right 
> expression.
>
> Roger
>
>
> On Sun, 3 Nov 2013, Oliver Soong wrote:
>
>> I've been using the CRAN rgdal and raster.  I apologize in advance for all
>> the linebreaks that will be broken.  This code should highlight the problem
>> and the fix:
>> 
>> 
>> 
>> require(rgdal)
>> require(raster)
>> r1 <- raster(system.file("external/test.grd", package="raster"))
>> r2 <- as(r1, "SpatialGridDataFrame")
>> r2.dims <- gridparameters(r2)$cells.dim
>> r3 <- new("GDALTransientDataset", driver = new("GDALDriver", "GTiff"), rows
>> = r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32", options =
>> NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
>> print(dir(tempdir()))
>> writeRaster(r1, file.path(tempdir(), "r1.tif"))
>> writeGDAL(r2, file.path(tempdir(), "r2.tif"))
>> print(dir(tempdir()))
>> unlink(dir(tempdir(), full.names = TRUE))
>> print(dir(tempdir()))
>> leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
>> invisible(lapply(paste("cmd /c del", leftover), system))
>> rm(r1, r2, r3)
>> gc()
>> unlink(dir(tempdir(), full.names = TRUE))
>> print(dir(tempdir()))
>> invisible(lapply(paste("cmd /c del", leftover), system))
>> 
>> 
>> 
>> Basically, I'm trying to write a standard raster package raster (r1) and an
>> sp package SpatialGridDataFrame (r2).  Both of those end up calling
>> new("GDALTransientDataset"), hence r3.  At the first print(dir(tempdir())),
>> only r3 has an open temporary file, which is expected.  At the second, all
>> three have open temporary files, and r1 and r2 have their written final
>> outputs, which are closed.  The temporary files for r1 and r2 should have
>> been closed at this point.  None of the temporary files can be removed by
>> unlink, although the final outputs can, as shown at the third
>> print(dir(tempdir())).  Windows can't remove them, either.  However, if I
>> remove the GDALTransientDataset r3 and initiate gc(), R can remove that
>> temporary file, but this does not work for r1 and r2.  After q(), the
>> tempdir() will not be removed by R, but it and the temporary files for r1
>> and r2 can now be removed.
>> 
>> It looks like GDAL.close is broken (again/as always), but the collector
>> function for GDALTransientDataset seems to at least close the handle.
>> GDAL.close relies on RGDAL_CloseDataset, whereas the GDALTransientDataset
>> collector just uses RGDAL_CloseHandle.  With the handle closed, I think the
>> unlink code in GDAL.close will work (as an aside, I'd use the pattern
>> paste0("^[a-z]{3}", basen, "$") to be safer and the argument full.names
>> might be simpler than constructing flf separately).  I believe
>> RGDAL_CloseDataset checks for NULL handles but just returns early, so it
>> should be the same to replace the .Call("RGDAL_CloseDataset", ...) with
>> .Call("RGDAL_CloseHandle", ...).
>> 
>> Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't know
>> enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile business
>> or why RGDAL_CloseHandle is commented out to make any useful suggestions
>> there.
>> 
>> Cheers,
>> Oliver
>> 
>> 
>> 
>> 
>> On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> 
>>> On Mon, 28 Oct 2013, Oliver Soong wrote:
>>>
>>>  I've had a long standing struggle with GDAL.close on Windows, and I
>>>> think I might finally have found a fix.  I'm currently running rgdal
>>>> 0.8.11, R 3.0.2, and 32-bit Windows 7.
>>>> 
>>>> Currently, writeRaster and writeGDAL create temporary files in the
>>>> tempdir() folder (the final filename prefixed with 3 random [a-z]
>>>> letters).  On my system, these files get left open and orphaned.  When
>>>> doing heavy processing, this can lead to the drive hosting the
>>>> tempdir() folder to become full, even if the data is being ultimately
>>>> written to a much larger drive.  This also means that R cannot clean
>>>> up these files or the tempdir() folder when it closes, causing similar
>>>> bloat in my %TEMP%.
>>>> 
>>>> I haven't tested this on other platforms, but I think it might help to
>>>> insert an extra line into GDAL.close:
>>>> 
>>>> .setCollectorFun(slot(dataset, "handle"), NULL)
>>>> .Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
>>>> .Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
>>>> 
>>>> For whatever reason, RGDAL_CloseDataset doesn't seem to actually close
>>>> the C file handle, but it doesn't seem to mind if the file handle was
>>>> closed beforehand.
>>>> 
>>> 
>>> Could you please provide a working example? I have looked at this, but
>>> need a baseline to know whether I'm looking at the same thing. I'm very
>>> unsure that this is a robust solution, and need an instrumented example,
>>> including listings of the temporary directory during the process, to see
>>> the consequences. Thanks for looking into this, but I'd prefer to be sure
>>> that a Windows-specific fix doesn't make things worse for others too.
>>> Please also report on the source of your Windows rgdal binary - is it from
>>> CRAN or locally built dynamically linking your own GDAL?
>>> 
>>> Best wishes,
>>> 
>>> Roger
>>> 
>>> 
>>>> Cheers,
>>>> Oliver
>>>> 
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>> 
>>>> 
>>> --
>>> Roger Bivand
>>> Department of Economics, NHH Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>> e-mail: Roger.Bivand at nhh.no
>>> 
>>> 
>> 
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list