[R-sig-Geo] rgdal: problem reading a bigger raster dataset (R 4.0.0/3.6.3, Ubuntu 20.04)
Thorsten Behrens
thor@ten@m@behren@ @end|ng |rom gm@||@com
Tue Apr 28 16:55:20 CEST 2020
Hi Mathias and all,
thanks! This confirms my test with the grid sizes related to 2^31-1
My current solution is to use
mLargeRaster = read_stars("Raster.tif")[[1]]
instead of
rLargeRaster = raster("Raster.tif")
mLargeRaster = raster::as.matrix(rLargeRaster)
Cheers,
Thorsten
Am 28.04.2020 um 16:29 schrieb Mathias Moser:
> Dear all,
>
> sorry if this is an obvious observation (or even completely wrong), but
> I got interested in what's happening here: I think you are hitting the
> long vector limit of R [1] and from a look at rgdal's code this happens
> because it uses 32-bit types INTSXP/REALSXP in RGDAL_GetRasterData()
> which are limited to 2^31-1 n. Making getRasterData() work for your
> data set would probably require patching rgdal for long vector support.
>
> (read_gdal_data() of stars _seems_ to use R_xlen_t instead to
> accomodate for 52bits [2]).
>
> Best wishes,
>
> Mathias
>
>
> [1]
> https://cloud.r-project.org/doc/manuals/r-patched/R-ints.html#Long-vectors
> [2]
> https://github.com/r-spatial/sf/blob/ea1bd716769ab8140d3451e3d902cfc79bc895d5/src/stars.cpp#L172
>
>
> On Tue, 2020-04-28 at 13:39 +0200, Thorsten Behrens wrote:
>> Roger and Mike,
>>
>> I really appreciate your help on this!
>>
>> I had a look at getRasterData(). It results in the same error. Hence,
>> I
>> made further tests where I compared grids with the following numbers
>> of
>> cols and rows:
>>
>> nPx = floor(sqrt(2^31 -1)) # 46340
>>
>> and
>>
>> nPx = ceiling(sqrt(2^31 -1)) # 46341
>>
>> The result is clear. Raster data with 46340 * 46340 px can be loaded
>> with getRasterData() and with raster(), raster::as.matrix(), whereas
>> datasets with 46341 * 46341 px cannot and result in the error:
>>
>> Error in getRasterData(gdCeil, band = 1) :
>> long vectors not supported yet: memory.c:3782
>>
>> read_stars() works for both. You find the corresponding code below.
>>
>> Are there any other option I can try?
>>
>> Thorsten
>>
>>
>> Reproducible code:
>>
>> ## generate raster datasets
>> # 46340 * 46340 grid dataset
>> sFileNameFloor = "Floor.tif"
>>
>> nPx = floor(sqrt(2^31 -1)) # 46340
>> nPx
>>
>> rFloor = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
>> nPx)))
>> values(rFloor) = 1
>>
>> writeRaster(rFloor, sFileNameFloor, "GTiff", overwrite = TRUE, NAflag
>> =
>> -9999)
>>
>>
>> # 46341 * 46341 grid dataset
>> sFileNameCeil = "Ceil.tif"
>>
>> nPx = ceiling(sqrt(2^31 -1))
>> nPx
>>
>> rCeil = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
>> nPx)))
>> values(rCeil) = 1
>>
>> writeRaster(rCeil, sFileNameCeil, "GTiff", overwrite = TRUE, NAflag
>> =
>> -9999)
>>
>>
>> ## load raster datasets with different methods
>>
>> # load Ceil
>> gdCeil = GDAL.open(sFileNameCeil)
>> dim(gdCeil)
>>
>> vnCeil = getRasterData(gdCeil, band = 1) # error
>>
>> GDAL.close(gdCeil)
>> str(vnCeil)
>>
>> stCeil = read_stars(sFileNameCeil) # all fine
>> str(stCeil[[1]])
>>
>> rCeil = raster(sFileNameCeil)
>> mCeil = raster::as.matrix(rCeil) # error
>> str(mCeil)
>>
>>
>> # load Floor
>> gdFloor = GDAL.open(sFileNameFloor)
>> dim(gdFloor)
>>
>> vnData = getRasterData(gdFloor, band = 1) # all fine
>>
>> GDAL.close(gdFloor)
>> str(vnData)
>>
>> stFloor= read_stars(sFileNameFloor) # all fine
>> str(stFloor[[1]])
>>
>> rFloor = raster(sFileNameFloor)
>> mFloor = raster::as.matrix(rFloor) # all fine
>> str(mFloor)
>>
>>
>>
>>
>> Am 28.04.2020 um 12:10 schrieb Roger Bivand:
>>> On Tue, 28 Apr 2020, Thorsten Behrens wrote:
>>>
>>>> Michael,
>>>>
>>>> Thanks for the hint, it seems to work! Real-world tests will
>>>> follow in
>>>> the next few days...
>>>>
>>>> So it definitely seems to be a problem of rgdal. It would be
>>>> great if it
>>>> could still be solved.
>>> Not rgdal, but your use of it. Try looking at a sequence of
>>>
>>> file <- system.file("pictures/SP27GTIF.TIF", package="rgdal")
>>> obj <- GDAL.open(file)
>>> (dims <- dim(obj))
>>> band <- 1
>>> data_vector <- getRasterData(obj, band)
>>> GDAL.close(obj)
>>> str(data_vector)
>>>
>>> This does not create any more complicated objects, just a matrix.
>>> In
>>> some cases, the rows in the matrix are ordered S -> N, so it may
>>> appear the wrong way up.
>>>
>>> rgdal::getRasterData() is lightweight, and has many arguments
>>> which
>>> may be helpful. rgdal::readGDAL() is heavyweight, creating a
>>> SpatialGridDataFrame object. This involves much copying of data,
>>> but
>>> the output object can be used for example in mapping or analysis
>>> without further conversion. My guess is that
>>> rgdal::getRasterData()
>>> gives you your matrix directly. Look at the R code to see how
>>> as.is=
>>> etc. work (files may include scale and offset values - recently a
>>> user
>>> was confused that scale and offset were "magically" applied to
>>> convert
>>> Uint16 values to degrees Kelvin on reading). For example, if as.is
>>> ==
>>> TRUE or scale == 1 and offset == 0, no copying of the input matrix
>>> occurs because it is not converted. If you could check this route,
>>> others following this thread could also benefit; if I'm wrong,
>>> that
>>> would also be good to know.
>>>
>>> Roger
>>>
>>>
>>>> Best,
>>>>
>>>> Thorsten
>>>>
>>>>
>>>>
>>>> Am 27.04.2020 um 15:58 schrieb Michael Sumner:
>>>>> Try stars it worked for me on a test
>>>>>
>>>>> On Mon., 27 Apr. 2020, 23:54 Thorsten Behrens,
>>>>> <
>>>>> thorsten.m.behrens using gmail.com
>>>>> <mailto:
>>>>> thorsten.m.behrens using gmail.com
>>>>> wrote:
>>>>>
>>>>> Roger,
>>>>>
>>>>> thanks a lot for your reply!
>>>>>
>>>>> I have 256GB RAM installed (mentioned it somewhere). And
>>>>> there,
>>>>> all is
>>>>> fine when I run:
>>>>>
>>>>> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>> 72000,
>>>>>
>>>>> values(rDemTest) = 1
>>>>>
>>>>> When limiting the memory to about 8GB with
>>>>> ulimit::memory_limit(8000),
>>>>> the max size which can be allocated seems to be around
>>>>> 10000 x
>>>>> 10000px.
>>>>> In this case all tests run fine. Unfortunately it seems to
>>>>> be
>>>>> related to
>>>>> the size of the grid (48000 x 72000) and therefore the
>>>>> problem
>>>>> can't be
>>>>> reproduced on machines with 8GB RAM. For some processing
>>>>> steps I
>>>>> need
>>>>> grids of that size in the memory, which is why I have 256
>>>>> GB
>>>>> installed.
>>>>>
>>>>> Normally, I use the raster package and not
>>>>> rgdal::readGDAL(). This
>>>>> was
>>>>> just a desperate attempt to find the source of the problem.
>>>>>
>>>>> This is what I use in my code:
>>>>>
>>>>> rDem = raster(sFileNameTiff)
>>>>> mDem = raster::as.matrix(rDem)
>>>>>
>>>>> But maybe this is the same...
>>>>>
>>>>> Any further suggestions are much appreciated!
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> Best,
>>>>>
>>>>> Thorsten
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Am 27.04.2020 um 14:50 schrieb Roger Bivand:
>>>>> > On Mon, 27 Apr 2020, Thorsten Behrens wrote:
>>>>> >
>>>>> >> Dear all,
>>>>> >>
>>>>> >> my problem is that I want to read a big geotiff raster
>>>>> dataset
>>>>> into R
>>>>> >> and convert it to a matrix, which does not work.
>>>>> >> The file is big but there is sufficient memory. I need
>>>>> all the
>>>>> data
>>>>> >> in the memory at the same time.
>>>>> >>
>>>>> >> The error occurs under R 3.6.3 as well as 4.0.0 using
>>>>> Ubuntu
>>>>> 20.04
>>>>> >> LTS with the latest version of the packages (see session
>>>>> info
>>>>> below)
>>>>> >> and 256GB RAM installed.
>>>>> >>
>>>>> >> When loading the raster dataset using rgdal (via readGDAL
>>>>> or
>>>>> >> raster::readAll) I get the follwoing error in R 4.0.0:
>>>>> >>
>>>>> >> ```
>>>>> >> Error in rgdal::getRasterData(con, offset = offs,
>>>>> region.dim =
>>>>> reg,
>>>>> >> band = object using data@band) :
>>>>> >> long vectors not supported yet: memory.c:3782
>>>>> >> ```
>>>>> >
>>>>> > On a 16GB Fedora platform:
>>>>> >
>>>>> >> library(raster) # 3.1-5
>>>>> >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>> 72000,
>>>>> > 0,
>>>>> > + 48000))) # all fine
>>>>> >> rDemTest
>>>>> > class : RasterLayer
>>>>> > dimensions : 48000, 72000, 3.456e+09 (nrow, ncol, ncell)
>>>>> > resolution : 1, 1 (x, y)
>>>>> > extent : 0, 72000, 0, 48000 (xmin, xmax, ymin, ymax)
>>>>> > crs : NA
>>>>> >
>>>>> >> values(rDemTest) = 1
>>>>> > Error: cannot allocate vector of size 25.7 Gb
>>>>> >
>>>>> > So you are deceiving yourself into thinking that all is
>>>>> fine at
>>>>> this
>>>>> > point. Please try to instantiate an example that can be
>>>>> reproduced on
>>>>> > a machine with 8GB RAM.
>>>>> >
>>>>> > Further note that rgdal::readGDAL() is not how you handle
>>>>> very
>>>>> large
>>>>> > objects in files, and never has been. raster can handle
>>>>> blocks
>>>>> of data
>>>>> > from bands in file; stars and gdalcubes can use proxy=TRUE
>>>>> for the
>>>>> > same purpose. Why did you choose rgdal::readGDAL() when
>>>>> this is
>>>>> not
>>>>> > its purpose?
>>>>> >
>>>>> > You did not say how much RAM is on your platform.
>>>>> >
>>>>> > Roger
>>>>> >
>>>>> >>
>>>>> >> In R 3.6.3 is is "... memory.c:3717"
>>>>> >>
>>>>> >> However, I can load the same file with the tiff package
>>>>> and a
>>>>> file of
>>>>> >> the same size in the native raster package format (*.grd)
>>>>> with
>>>>> the
>>>>> >> raster package but again not with the rgdal package.
>>>>> >>
>>>>> >> gdalinfo (gdalUtils) does not complain (see below).
>>>>> Hence, Even
>>>>> >> Rouault assumes the problem is related to rgdal and not
>>>>> gdal
>>>>> >> (
>>>>> https://github.com/OSGeo/gdal/issues/2442
>>>>> ).
>>>>> >>
>>>>> >> Below you find reproducible code, which generates a
>>>>> raster file,
>>>>> >> saves the two formats (.tiff and .grd) and tries to read
>>>>> them
>>>>> with
>>>>> >> the different packages.
>>>>> >>
>>>>> >> Is this a known limitation? Any help is greatly
>>>>> appreciated!
>>>>> >>
>>>>> >> Thanks a lot in advance!
>>>>> >>
>>>>> >> Best wishes and stay healthy,
>>>>> >> Thorsten
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> ### Steps to reproduce the problem.
>>>>> >>
>>>>> >> R code:
>>>>> >>
>>>>> >> ```
>>>>> >> library(rgdal) # 1.4-8
>>>>> >> library(raster) # 3.1-5
>>>>> >> library(tiff) # 0.1-5
>>>>> >>
>>>>> >> ## generate and manipulate a big raster dataset
>>>>> >> # - generate
>>>>> >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>> >> 72000, 0, 48000))) # all fine
>>>>> >>
>>>>> >> # - manipulate
>>>>> >> values(rDemTest) = 1 # all fine
>>>>> >>
>>>>> >> # - convert
>>>>> >> mDemTest = raster::as.matrix(rDemTest) # all fine
>>>>> >> str(mDemTest)
>>>>> >>
>>>>> >> ## save a big dataset
>>>>> >>
>>>>> >> # - as raster/gdal
>>>>> >> sFileNameTiff = "BigData.tif"
>>>>> >> writeRaster(rDemTest, sFileNameTiff, "GTiff", overwrite =
>>>>> TRUE,
>>>>> >> NAflag = -9999) # all fine
>>>>> >>
>>>>> >> # - as raster native
>>>>> >> sFileNameNative = "BigData.grd"
>>>>> >> writeRaster(rDemTest, sFileNameNative, "raster",
>>>>> overwrite =
>>>>> TRUE,
>>>>> >> NAflag = -9999) # all fine
>>>>> >>
>>>>> >>
>>>>> >> ## load the big raster datasets with different packages
>>>>> and
>>>>> options
>>>>> >> # - load the tiff data with the gdal package via the
>>>>> raster
>>>>> package
>>>>> >> rDem = raster(sFileNameTiff) # all fine
>>>>> >> extent(rDem) # all fine
>>>>> >> mDem = raster::as.matrix(rDem) # error
>>>>> >> rDem = readAll(rDem) # error
>>>>> >>
>>>>> >> # - load the native raster data with the raster package
>>>>> >> rDem = raster(sFileNameNative) # all fine
>>>>> >> extent(rDem) # all fine
>>>>> >> mDem = raster::as.matrix(rDem) # all fine
>>>>> >> str(mDem)
>>>>> >>
>>>>> >> # - load the tiff data with the tiff package
>>>>> >> mDem = readTIFF(sFileNameTiff) # all fine
>>>>> >> str(mDem)
>>>>> >>
>>>>> >> # - load the tiff data with the gdal package
>>>>> >> sfDem = readGDAL(sFileNameTiff) # error
>>>>> >>
>>>>> >> # - load the native raster data with the gdal package
>>>>> >> sfDem = readGDAL(sFileNameNative) # error
>>>>> >>
>>>>> >> ```
>>>>> >>
>>>>> >>
>>>>> >> ### Startup messages when rgdal is attached (requested by
>>>>> Roger
>>>>> Bivand)
>>>>> >>> library(rgdal)
>>>>> >> rgdal: version: 1.4-8, (SVN revision 845)
>>>>> >> Geospatial Data Abstraction Library extensions to R
>>>>> successfully loaded
>>>>> >> Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
>>>>> >> Path to GDAL shared files:
>>>>> >> GDAL binary built with GEOS: TRUE
>>>>> >> Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th, 2020,
>>>>> [PJ_VERSION:
>>>>> >> 631]
>>>>> >> Path to PROJ.4 shared files: (autodetected)
>>>>> >> Linking to sp version: 1.4-1
>>>>> >>
>>>>> >>
>>>>> >> ### Session info
>>>>> >>> sessionInfo()
>>>>> >> R version 4.0.0 (2020-04-24)
>>>>> >> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>> >> Running under: Ubuntu 20.04 LTS
>>>>> >>
>>>>> >> Matrix products: default
>>>>> >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-
>>>>> pthread/libblas.so.3
>>>>> >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-
>>>>> pthread/liblapack.so.3
>>>>> >>
>>>>> >> locale:
>>>>> >> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>>>>> LC_TIME=de_DE.UTF-8
>>>>> >> [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
>>>>> >> LC_MESSAGES=de_DE.UTF-8
>>>>> >> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C
>>>>> >> [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8
>>>>> >> LC_IDENTIFICATION=C
>>>>> >>
>>>>> >> attached base packages:
>>>>> >> [1] stats graphics grDevices utils datasets methods
>>>>> base
>>>>> >>
>>>>> >> other attached packages:
>>>>> >> [1] gdalUtils_2.0.3.2 rgdal_1.4-8 tiff_0.1-5
>>>>> raster_3.1-5
>>>>> >> sp_1.4-1
>>>>> >>
>>>>> >> loaded via a namespace (and not attached):
>>>>> >> [1] compiler_4.0.0 tools_4.0.0 Rcpp_1.0.4.6
>>>>> >> R.methodsS3_1.8.0 codetools_0.2-16
>>>>> >> [6] grid_4.0.0 iterators_1.0.12 foreach_1.5.0
>>>>> >> R.utils_2.9.2 R.oo_1.23.0
>>>>> >> [11] lattice_0.20-41
>>>>> >>
>>>>> >>
>>>>> >> ### gdalInfo
>>>>> >>> gdalinfo(sFileNameTiff)
>>>>> >> [1] "Driver: GTiff/GeoTIFF"
>>>>> >> [2] "Files: BigData.tif"
>>>>> >> [3] "Size is 72000, 48000"
>>>>> >> [4] "Origin = (0.000000000000000,48000.000000000000000)"
>>>>> >> [5] "Pixel Size = (1.000000000000000,-
>>>>> 1.000000000000000)"
>>>>> >> [6] "Image Structure Metadata:"
>>>>> >> [7] " COMPRESSION=LZW"
>>>>> >> [8] " INTERLEAVE=BAND"
>>>>> >> [9] "Corner Coordinates:"
>>>>> >> [10] "Upper Left ( 0.000, 48000.000) "
>>>>> >> [11] "Lower Left ( 0.0000000, 0.0000000) "
>>>>> >> [12] "Upper Right ( 72000.000, 48000.000) "
>>>>> >> [13] "Lower Right ( 72000.000, 0.000) "
>>>>> >> [14] "Center ( 36000.000, 24000.000) "
>>>>> >> [15] "Band 1 Block=72000x1 Type=Float32,
>>>>> ColorInterp=Gray"
>>>>> >> [16] " Min=1.000 Max=1.000 "
>>>>> >> [17] " Minimum=1.000, Maximum=1.000, Mean=nan,
>>>>> StdDev=nan"
>>>>> >> [18] " NoData Value=-9999"
>>>>> >> [19] " Metadata:"
>>>>> >> [20] " STATISTICS_MAXIMUM=1"
>>>>> >> [21] " STATISTICS_MEAN=nan"
>>>>> >> [22] " STATISTICS_MINIMUM=1"
>>>>> >> [23] " STATISTICS_STDDEV=nan"
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> R-sig-Geo mailing list
>>>>> >>
>>>>> R-sig-Geo using r-project.org
>>>>> <mailto:
>>>>> R-sig-Geo using r-project.org
>>>>> >>
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>> >>
>>>>> >>
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>>
>>>>> R-sig-Geo using r-project.org
>>>>> <mailto:
>>>>> R-sig-Geo using r-project.org
>>>>>
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo using r-project.org
>>>>
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
More information about the R-sig-Geo
mailing list