[R-sig-Geo] GDAL.close

Roger Bivand Roger.Bivand at nhh.no
Mon Nov 4 22:03:25 CET 2013


Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run as 
administrator or not? I agree that the code in those parts of rgdal is not 
well-designed - it was well-designed, but has been modified so that it 
works for most people cross-platform, and has had to accommodate changes 
that have taken place in GDAL over more than 10 years, not least the 
error-handler.

The simple solution to your practical problem is to for you to use a 
larger temporary drive under Windows, or change to an operating system 
that does not have these side-effects.

Assisting you is not just a matter of doing what you think works for you, 
but making sure it doesn't break anything else for anybody else 
cross-platform.

Your script does not check for other files in tempdir, so I prepended a 
listing of prior content:

pc <- dir(tempdir())

and dropped them from the list for unlinking:

now <- dir(tempdir())
unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))

I do not see how your script exercises the problem. It creates a new 
transient file, but does not close it, which was the behaviour you are 
unhappy with. If I add

GDAL.close(r3)

on Linux, the transient dataset is removed. On Windows 7 64-bit with the 
CRAN rgdal binary run as user, temporary files are left in tempdir for r1, 
r2, and r3. The same three temporary files are left when run as 
administrator.

The earliest version of GDAL.close was:

GDAL.close <- function(dataset) {
             .setCollectorFun(slot(dataset, 'handle'), NULL)
             .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
             invisible()
}

with a version in 2007 in the THK branch calling a closeDataset method, 
containing:

             handle <- slot(dataset, "handle")
             unreg.finalizer(handle)
             .Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")

with:

unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)

and by 2010 was:

GDAL.close <- function(dataset) {
             .setCollectorFun(slot(dataset, 'handle'), NULL)
             .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
             invisible(gc())
}

Special handling of GDALTransientDataset was added in revision 433 in 
Janual 2013, and modified in revision 462 in April 2013.

It has seemed IIRC that Windows can treat arbitrary files as open. It is 
also possible that there is an interaction between Windows and 
rgdal:::.setCollectorFun(), which does what it should, when given the NULL 
argument, setting:

.setCollectorFun <- function(object, fun) {

   if (is.null(fun)) fun <- function(obj) obj
   reg.finalizer(object, fun, onexit=TRUE)

}

so incorporating the THK branch logic. It could possibly also vary across 
drivers, so finding a robust fix means setting up a test rig with multiple 
Windows machines and testing for multiple drivers to see why some temprary 
files are being treated as open when other operating systems don't have 
problems in their removal. Windows users with too small temporary 
directories. I welcome contributions from people who understand Windows 
and can actually explain why we see the consequences we see.

One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle, 
PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back once 
the package has gone through win-builder.

Hope this doesn't muddle too much, clarification doesn't seem like the 
right expression.

Roger


On Sun, 3 Nov 2013, Oliver Soong wrote:

> I've been using the CRAN rgdal and raster.  I apologize in advance for all
> the linebreaks that will be broken.  This code should highlight the problem
> and the fix:
>
>
>
> require(rgdal)
> require(raster)
> r1 <- raster(system.file("external/test.grd", package="raster"))
> r2 <- as(r1, "SpatialGridDataFrame")
> r2.dims <- gridparameters(r2)$cells.dim
> r3 <- new("GDALTransientDataset", driver = new("GDALDriver", "GTiff"), rows
> = r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32", options =
> NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
> print(dir(tempdir()))
> writeRaster(r1, file.path(tempdir(), "r1.tif"))
> writeGDAL(r2, file.path(tempdir(), "r2.tif"))
> print(dir(tempdir()))
> unlink(dir(tempdir(), full.names = TRUE))
> print(dir(tempdir()))
> leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
> invisible(lapply(paste("cmd /c del", leftover), system))
> rm(r1, r2, r3)
> gc()
> unlink(dir(tempdir(), full.names = TRUE))
> print(dir(tempdir()))
> invisible(lapply(paste("cmd /c del", leftover), system))
>
>
>
> Basically, I'm trying to write a standard raster package raster (r1) and an
> sp package SpatialGridDataFrame (r2).  Both of those end up calling
> new("GDALTransientDataset"), hence r3.  At the first print(dir(tempdir())),
> only r3 has an open temporary file, which is expected.  At the second, all
> three have open temporary files, and r1 and r2 have their written final
> outputs, which are closed.  The temporary files for r1 and r2 should have
> been closed at this point.  None of the temporary files can be removed by
> unlink, although the final outputs can, as shown at the third
> print(dir(tempdir())).  Windows can't remove them, either.  However, if I
> remove the GDALTransientDataset r3 and initiate gc(), R can remove that
> temporary file, but this does not work for r1 and r2.  After q(), the
> tempdir() will not be removed by R, but it and the temporary files for r1
> and r2 can now be removed.
>
> It looks like GDAL.close is broken (again/as always), but the collector
> function for GDALTransientDataset seems to at least close the handle.
> GDAL.close relies on RGDAL_CloseDataset, whereas the GDALTransientDataset
> collector just uses RGDAL_CloseHandle.  With the handle closed, I think the
> unlink code in GDAL.close will work (as an aside, I'd use the pattern
> paste0("^[a-z]{3}", basen, "$") to be safer and the argument full.names
> might be simpler than constructing flf separately).  I believe
> RGDAL_CloseDataset checks for NULL handles but just returns early, so it
> should be the same to replace the .Call("RGDAL_CloseDataset", ...) with
> .Call("RGDAL_CloseHandle", ...).
>
> Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't know
> enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile business
> or why RGDAL_CloseHandle is commented out to make any useful suggestions
> there.
>
> Cheers,
> Oliver
>
>
>
>
> On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Mon, 28 Oct 2013, Oliver Soong wrote:
>>
>>  I've had a long standing struggle with GDAL.close on Windows, and I
>>> think I might finally have found a fix.  I'm currently running rgdal
>>> 0.8.11, R 3.0.2, and 32-bit Windows 7.
>>>
>>> Currently, writeRaster and writeGDAL create temporary files in the
>>> tempdir() folder (the final filename prefixed with 3 random [a-z]
>>> letters).  On my system, these files get left open and orphaned.  When
>>> doing heavy processing, this can lead to the drive hosting the
>>> tempdir() folder to become full, even if the data is being ultimately
>>> written to a much larger drive.  This also means that R cannot clean
>>> up these files or the tempdir() folder when it closes, causing similar
>>> bloat in my %TEMP%.
>>>
>>> I haven't tested this on other platforms, but I think it might help to
>>> insert an extra line into GDAL.close:
>>>
>>> .setCollectorFun(slot(dataset, "handle"), NULL)
>>> .Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
>>> .Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
>>>
>>> For whatever reason, RGDAL_CloseDataset doesn't seem to actually close
>>> the C file handle, but it doesn't seem to mind if the file handle was
>>> closed beforehand.
>>>
>>
>> Could you please provide a working example? I have looked at this, but
>> need a baseline to know whether I'm looking at the same thing. I'm very
>> unsure that this is a robust solution, and need an instrumented example,
>> including listings of the temporary directory during the process, to see
>> the consequences. Thanks for looking into this, but I'd prefer to be sure
>> that a Windows-specific fix doesn't make things worse for others too.
>> Please also report on the source of your Windows rgdal binary - is it from
>> CRAN or locally built dynamically linking your own GDAL?
>>
>> Best wishes,
>>
>> Roger
>>
>>
>>> Cheers,
>>> Oliver
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, NHH Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list