[R-sig-Geo] raster/rgdal- problem: Too many open files (Linux)

Roger Bivand Roger.Bivand at nhh.no
Wed Aug 7 11:49:20 CEST 2013


On Wed, 7 Aug 2013, Jon Olav Skoien wrote:

>
> On 06-Aug-13 21:35, Roger Bivand wrote:
>> On Tue, 6 Aug 2013, Mauricio Zambrano-Bigiarini wrote:
>> 
>>> On 08/05/2013 04:37 PM, Jon Olav Skoien wrote:
>>>> Dear list,
>>>> 
>>>> We have a problem which appears to be a bug in either rgdal or raster,
>>>> although it could also be a bug in base R or in our understanding of how
>>>> to deal with connections.
>>>> 
>>>> We have a process which is writing a rather large (~10-20.000) number of
>>>> geoTiffs via writeRaster. However, the process has frequently stopped
>>>> with an error of the type:
>>>> Error in .local(.Object, ...) :
>>>>     TIFFOpen:/local0/skoiejo/hri/test.tif: Too many open files
>>>> The issue seems to be the creation of temp-files in the temp directory
>>>> which is given by tempdir(), not by raster:::.tmpdir(). These temp-files
>>>> seem to be created by the call
>>>>     transient <- new("GDALTransientDataset", driver=driver, rows=r at nrows,
>>>> cols=r at ncols, bands=nbands, type=dataformat, fname=filename,
>>>> options=options, handle=NULL)
>>>> from raster:::.getGDALtransient
>>>> The temp-files are deleted after writing the geoTiff, but are not
>>>> removed from the list of open files in Linux, which on our system was
>>>> limited to 1024 files (ulimit -n) per process. Below is a script which
>>>> can replicate the issue (takes a few minutes to reach 1024) and
>>>> sessionInfo().
>>>> 
>>>> Currently we are trying to solve the issue by increasing the limit of
>>>> file connections, but we would prefer a solution where the connections
>>>> are properly deleted, either before writeRaster finishes, or a command
>>>> which we can include in our script, either R-code or a call to System().
>>>> The connections are not visible via showConnections(), and
>>>> closeAllConnections() does not help.
>>>> 
>>>> Thanks,
>>>> Jon
>>> 
>>> I stumbled across the same problem (with exactly the same configuration 
>>> reported by Jon with 'sessionInfo()'), while trying to change the values 
>>> of some pixels in more than 6000 maps.
>>> 
>>> Thank  you very much Jon for the detailed report about the problem, which 
>>> helped me to find a workaround to this problem (so far, just to split the 
>>> 6000 maps in smaller groups).
>>> 
>>>> 
>>>> 
>>>> r <- raster(system.file("external/test.grd", package="raster"))
>>>> for (ifile in 1:2000) {
>>>>     writeRaster(r, "test.tif", format = "GTiff", overwrite = TRUE)
>>>>     print(ifile)
>>>> }
>>>> 
>>> 
>>> After trying the previous reproducible code, I don't understand why I got 
>>> the error when ifile=1019 and not 1024:
>>> 
>>> ....
>>> [1] 1018
>>> [1] 1019
>>> Error in .local(.Object, ...) :
>>>  TIFFOpen:/home/hzambran/test.tif: Too many open files
>>> 
>> 
>> There are other files opened by the R process that reduce the number 
>> needed. The problem is in the GDAL bindings with R, I haven't tried to see 
>> whether other applications keeping GDAL loaded face the same issues. GDAL 
>> applications typically write once and exit, so this isn't a problem there.
>> 
>> The current GDAL.close() code says unlink() to a vector of files with the 
>> same basename, but actually unlink() now appears to fail, leaving the files 
>> in place. Using file.remove() leads to the same result, and using 
>> deleteFile() provokes other problems.
>> 
>> This will probably turn out to be something trivial, but will take a great 
>> deal of time to debug, as the consequences of changing the dataset 
>> structure are possibly extensive.
>> 
>> For the time being, the work-around is the only route; if volunteers can 
>> debug this, progress may be possible, but everything else has to continue 
>> to work.
>> 
>> Roger
>
> Roger, thanks for having a look at this.
> I just checked with an older version, and it seems the problem was introduced 
> more or less at the same time as a valgrind issue was fixed in revision 456. 
> Running the example above worked with R 2.14.0 and rgdal 0.8-5 (sessionInfo 
> below), but failed when upgrading rgdal (and sp). The problem seems to be in 
> the C++ code (I already tried to revert the R code of GDAL.close to 0.8-5, 
> without any difference), which I am unfortunately not able to debug.

Thanks. The changes made then were a result of an audit for references to 
de-referenced pointers and memory leaks. I've looked to see whether it is 
possible to revert optionally to the former behaviour (admitting 
references to de-referenced pointers and memory leaks), but I can't see an 
easy resolution. So for now the workarounds you propose are those to 
follow.

Roger

>
> As there is no quick fix at the moment, I just thought it would be good to 
> summarize the possible workarounds for other people who encounter this 
> problem:
> - Split up the process in smaller problems
> - Increase the number of possible file connections (the standard on Linux 
> seems to be 1024, but I have not seen any reason for not increasing this to 
> e.g. 40.000, as currently on our system)
> - Do parallel processing (will work better as each sub-process will have its 
> own list of file connections).
>
> Best wishes,
> Jon
>
>
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rgdal_0.8-5   raster_2.0-41 sp_1.0-5
>
> loaded via a namespace (and not attached):
> [1] grid_2.14.0    lattice_0.20-0
>
>
>
>
>> 
>>> 
>>> 
>>> Thanks again Jon for sharing your findings about this.
>>> 
>>> All the best,
>>> 
>>> Mauricio Zambrano-Bigiarini, Ph.D
>>> 
>>> 
>>> 
>> 
>
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list