[R-sig-Geo] raster/rgdal- problem: Too many open files (Linux)
Roger Bivand
Roger.Bivand at nhh.no
Wed Aug 7 11:49:20 CEST 2013
On Wed, 7 Aug 2013, Jon Olav Skoien wrote:
>
> On 06-Aug-13 21:35, Roger Bivand wrote:
>> On Tue, 6 Aug 2013, Mauricio Zambrano-Bigiarini wrote:
>>
>>> On 08/05/2013 04:37 PM, Jon Olav Skoien wrote:
>>>> Dear list,
>>>>
>>>> We have a problem which appears to be a bug in either rgdal or raster,
>>>> although it could also be a bug in base R or in our understanding of how
>>>> to deal with connections.
>>>>
>>>> We have a process which is writing a rather large (~10-20.000) number of
>>>> geoTiffs via writeRaster. However, the process has frequently stopped
>>>> with an error of the type:
>>>> Error in .local(.Object, ...) :
>>>> TIFFOpen:/local0/skoiejo/hri/test.tif: Too many open files
>>>> The issue seems to be the creation of temp-files in the temp directory
>>>> which is given by tempdir(), not by raster:::.tmpdir(). These temp-files
>>>> seem to be created by the call
>>>> transient <- new("GDALTransientDataset", driver=driver, rows=r at nrows,
>>>> cols=r at ncols, bands=nbands, type=dataformat, fname=filename,
>>>> options=options, handle=NULL)
>>>> from raster:::.getGDALtransient
>>>> The temp-files are deleted after writing the geoTiff, but are not
>>>> removed from the list of open files in Linux, which on our system was
>>>> limited to 1024 files (ulimit -n) per process. Below is a script which
>>>> can replicate the issue (takes a few minutes to reach 1024) and
>>>> sessionInfo().
>>>>
>>>> Currently we are trying to solve the issue by increasing the limit of
>>>> file connections, but we would prefer a solution where the connections
>>>> are properly deleted, either before writeRaster finishes, or a command
>>>> which we can include in our script, either R-code or a call to System().
>>>> The connections are not visible via showConnections(), and
>>>> closeAllConnections() does not help.
>>>>
>>>> Thanks,
>>>> Jon
>>>
>>> I stumbled across the same problem (with exactly the same configuration
>>> reported by Jon with 'sessionInfo()'), while trying to change the values
>>> of some pixels in more than 6000 maps.
>>>
>>> Thank you very much Jon for the detailed report about the problem, which
>>> helped me to find a workaround to this problem (so far, just to split the
>>> 6000 maps in smaller groups).
>>>
>>>>
>>>>
>>>> r <- raster(system.file("external/test.grd", package="raster"))
>>>> for (ifile in 1:2000) {
>>>> writeRaster(r, "test.tif", format = "GTiff", overwrite = TRUE)
>>>> print(ifile)
>>>> }
>>>>
>>>
>>> After trying the previous reproducible code, I don't understand why I got
>>> the error when ifile=1019 and not 1024:
>>>
>>> ....
>>> [1] 1018
>>> [1] 1019
>>> Error in .local(.Object, ...) :
>>> TIFFOpen:/home/hzambran/test.tif: Too many open files
>>>
>>
>> There are other files opened by the R process that reduce the number
>> needed. The problem is in the GDAL bindings with R, I haven't tried to see
>> whether other applications keeping GDAL loaded face the same issues. GDAL
>> applications typically write once and exit, so this isn't a problem there.
>>
>> The current GDAL.close() code says unlink() to a vector of files with the
>> same basename, but actually unlink() now appears to fail, leaving the files
>> in place. Using file.remove() leads to the same result, and using
>> deleteFile() provokes other problems.
>>
>> This will probably turn out to be something trivial, but will take a great
>> deal of time to debug, as the consequences of changing the dataset
>> structure are possibly extensive.
>>
>> For the time being, the work-around is the only route; if volunteers can
>> debug this, progress may be possible, but everything else has to continue
>> to work.
>>
>> Roger
>
> Roger, thanks for having a look at this.
> I just checked with an older version, and it seems the problem was introduced
> more or less at the same time as a valgrind issue was fixed in revision 456.
> Running the example above worked with R 2.14.0 and rgdal 0.8-5 (sessionInfo
> below), but failed when upgrading rgdal (and sp). The problem seems to be in
> the C++ code (I already tried to revert the R code of GDAL.close to 0.8-5,
> without any difference), which I am unfortunately not able to debug.
Thanks. The changes made then were a result of an audit for references to
de-referenced pointers and memory leaks. I've looked to see whether it is
possible to revert optionally to the former behaviour (admitting
references to de-referenced pointers and memory leaks), but I can't see an
easy resolution. So for now the workarounds you propose are those to
follow.
Roger
>
> As there is no quick fix at the moment, I just thought it would be good to
> summarize the possible workarounds for other people who encounter this
> problem:
> - Split up the process in smaller problems
> - Increase the number of possible file connections (the standard on Linux
> seems to be 1024, but I have not seen any reason for not increasing this to
> e.g. 40.000, as currently on our system)
> - Do parallel processing (will work better as each sub-process will have its
> own list of file connections).
>
> Best wishes,
> Jon
>
>
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rgdal_0.8-5 raster_2.0-41 sp_1.0-5
>
> loaded via a namespace (and not attached):
> [1] grid_2.14.0 lattice_0.20-0
>
>
>
>
>>
>>>
>>>
>>> Thanks again Jon for sharing your findings about this.
>>>
>>> All the best,
>>>
>>> Mauricio Zambrano-Bigiarini, Ph.D
>>>
>>>
>>>
>>
>
>
>
--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list