[R-sig-Geo] rgdal: problem reading a bigger raster dataset (R 4.0.0/3.6.3, Ubuntu 20.04)

Thorsten Behrens thor@ten@m@behren@ @end|ng |rom gm@||@com
Tue Apr 28 16:55:20 CEST 2020


Hi Mathias and all,

thanks! This confirms my test with the grid sizes related to 2^31-1

My current solution is to use

mLargeRaster = read_stars("Raster.tif")[[1]]

instead of

rLargeRaster = raster("Raster.tif")
mLargeRaster = raster::as.matrix(rLargeRaster)

Cheers,
Thorsten



Am 28.04.2020 um 16:29 schrieb Mathias Moser:
> Dear all,
>
> sorry if this is an obvious observation (or even completely wrong), but
> I got interested in what's happening here: I think you are hitting the
> long vector limit of R [1] and from a look at rgdal's code this happens
> because it uses 32-bit types INTSXP/REALSXP in RGDAL_GetRasterData()
> which are limited to 2^31-1 n. Making getRasterData() work for your
> data set would probably require patching rgdal for long vector support.
>
> (read_gdal_data() of stars _seems_ to use R_xlen_t instead to
> accomodate for 52bits [2]).
>
> Best wishes,
>
> Mathias
>
>
> [1]
> https://cloud.r-project.org/doc/manuals/r-patched/R-ints.html#Long-vectors
> [2]
> https://github.com/r-spatial/sf/blob/ea1bd716769ab8140d3451e3d902cfc79bc895d5/src/stars.cpp#L172
>
>
> On Tue, 2020-04-28 at 13:39 +0200, Thorsten Behrens wrote:
>> Roger and Mike,
>>
>> I really appreciate your help on this!
>>
>> I had a look at getRasterData(). It results in the same error. Hence,
>> I
>> made further tests where I compared grids with the following numbers
>> of
>> cols and rows:
>>
>> nPx = floor(sqrt(2^31 -1)) # 46340
>>
>> and
>>
>> nPx = ceiling(sqrt(2^31 -1)) # 46341
>>
>> The result is clear. Raster data with 46340 * 46340 px can be loaded
>> with getRasterData() and with raster(), raster::as.matrix(), whereas
>> datasets with 46341 * 46341 px cannot and result in the error:
>>
>> Error in getRasterData(gdCeil, band = 1) :
>> long vectors not supported yet: memory.c:3782
>>
>> read_stars() works for both. You find the corresponding code below.
>>
>> Are there any other option I can try?
>>
>> Thorsten
>>
>>
>> Reproducible code:
>>
>> ## generate raster datasets
>> # 46340 * 46340 grid dataset
>> sFileNameFloor = "Floor.tif"
>>
>> nPx = floor(sqrt(2^31 -1)) # 46340
>> nPx
>>
>> rFloor = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
>> nPx)))
>> values(rFloor) = 1
>>
>> writeRaster(rFloor, sFileNameFloor, "GTiff", overwrite = TRUE, NAflag
>> =
>> -9999)
>>
>>
>> # 46341 * 46341 grid dataset
>> sFileNameCeil = "Ceil.tif"
>>
>> nPx = ceiling(sqrt(2^31 -1))
>> nPx
>>
>> rCeil = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
>> nPx)))
>> values(rCeil) = 1
>>
>> writeRaster(rCeil, sFileNameCeil, "GTiff", overwrite = TRUE, NAflag
>> =
>> -9999)
>>
>>
>> ## load raster datasets with different methods
>>
>> # load Ceil
>> gdCeil = GDAL.open(sFileNameCeil)
>> dim(gdCeil)
>>
>> vnCeil = getRasterData(gdCeil, band = 1) # error
>>
>> GDAL.close(gdCeil)
>> str(vnCeil)
>>
>> stCeil = read_stars(sFileNameCeil) # all fine
>> str(stCeil[[1]])
>>
>> rCeil = raster(sFileNameCeil)
>> mCeil = raster::as.matrix(rCeil) # error
>> str(mCeil)
>>
>>
>> # load Floor
>> gdFloor = GDAL.open(sFileNameFloor)
>> dim(gdFloor)
>>
>> vnData = getRasterData(gdFloor, band = 1) # all fine
>>
>> GDAL.close(gdFloor)
>> str(vnData)
>>
>> stFloor= read_stars(sFileNameFloor) # all fine
>> str(stFloor[[1]])
>>
>> rFloor = raster(sFileNameFloor)
>> mFloor = raster::as.matrix(rFloor) # all fine
>> str(mFloor)
>>
>>
>>
>>
>> Am 28.04.2020 um 12:10 schrieb Roger Bivand:
>>> On Tue, 28 Apr 2020, Thorsten Behrens wrote:
>>>
>>>> Michael,
>>>>
>>>> Thanks for the hint, it seems to work! Real-world tests will
>>>> follow in
>>>> the next few days...
>>>>
>>>> So it definitely seems to be a problem of rgdal. It would be
>>>> great if it
>>>> could still be solved.
>>> Not rgdal, but your use of it. Try looking at a sequence of
>>>
>>> file <- system.file("pictures/SP27GTIF.TIF", package="rgdal")
>>> obj <- GDAL.open(file)
>>> (dims <- dim(obj))
>>> band <- 1
>>> data_vector <- getRasterData(obj, band)
>>> GDAL.close(obj)
>>> str(data_vector)
>>>
>>> This does not create any more complicated objects, just a matrix.
>>> In
>>> some cases, the rows in the matrix are ordered S -> N, so it may
>>> appear the wrong way up.
>>>
>>> rgdal::getRasterData() is lightweight, and has many arguments
>>> which
>>> may be helpful. rgdal::readGDAL() is heavyweight, creating a
>>> SpatialGridDataFrame object. This involves much copying of data,
>>> but
>>> the output object can be used for example in mapping or analysis
>>> without further conversion. My guess is that
>>> rgdal::getRasterData()
>>> gives you your matrix directly. Look at the R code to see how
>>> as.is=
>>> etc. work (files may include scale and offset values - recently a
>>> user
>>> was confused that scale and offset were "magically" applied to
>>> convert
>>> Uint16 values to degrees Kelvin on reading). For example, if as.is
>>> ==
>>> TRUE or scale == 1 and offset == 0, no copying of the input matrix
>>> occurs because it is not converted. If you could check this route,
>>> others following this thread could also benefit; if I'm wrong,
>>> that
>>> would also be good to know.
>>>
>>> Roger
>>>
>>>
>>>> Best,
>>>>
>>>> Thorsten
>>>>
>>>>
>>>>
>>>> Am 27.04.2020 um 15:58 schrieb Michael Sumner:
>>>>> Try stars it worked for me on a test
>>>>>
>>>>> On Mon., 27 Apr. 2020, 23:54 Thorsten Behrens,
>>>>> <
>>>>> thorsten.m.behrens using gmail.com
>>>>>   <mailto:
>>>>> thorsten.m.behrens using gmail.com
>>>>> wrote:
>>>>>
>>>>>      Roger,
>>>>>
>>>>>      thanks a lot for your reply!
>>>>>
>>>>>      I have 256GB RAM installed (mentioned it somewhere). And
>>>>> there,
>>>>>      all is
>>>>>      fine when I run:
>>>>>
>>>>>      rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>> 72000,
>>>>>
>>>>>      values(rDemTest) = 1
>>>>>
>>>>>      When limiting the memory to about 8GB with
>>>>>      ulimit::memory_limit(8000),
>>>>>      the max size which can be allocated seems to be around
>>>>> 10000 x
>>>>>      10000px.
>>>>>      In this case all tests run fine. Unfortunately it seems to
>>>>> be
>>>>>      related to
>>>>>      the size of the grid (48000 x 72000) and therefore the
>>>>> problem
>>>>>      can't be
>>>>>      reproduced on machines with 8GB RAM. For some processing
>>>>> steps I
>>>>> need
>>>>>      grids of that size in the memory, which is why I have 256
>>>>> GB
>>>>>      installed.
>>>>>
>>>>>      Normally, I use the raster package and not
>>>>> rgdal::readGDAL(). This
>>>>>      was
>>>>>      just a desperate attempt to find the source of the problem.
>>>>>
>>>>>      This is what I use in my code:
>>>>>
>>>>>      rDem = raster(sFileNameTiff)
>>>>>      mDem = raster::as.matrix(rDem)
>>>>>
>>>>>      But maybe this is the same...
>>>>>
>>>>>      Any further suggestions are much appreciated!
>>>>>
>>>>>      Thanks again!
>>>>>
>>>>>      Best,
>>>>>
>>>>>      Thorsten
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>      Am 27.04.2020 um 14:50 schrieb Roger Bivand:
>>>>>     > On Mon, 27 Apr 2020, Thorsten Behrens wrote:
>>>>>     >
>>>>>     >> Dear all,
>>>>>     >>
>>>>>     >> my problem is that I want to read a big geotiff raster
>>>>> dataset
>>>>>      into R
>>>>>     >> and convert it to a matrix, which does not work.
>>>>>     >> The file is big but there is sufficient memory. I need
>>>>> all the
>>>>>      data
>>>>>     >> in the memory at the same time.
>>>>>     >>
>>>>>     >> The error occurs under R 3.6.3 as well as 4.0.0 using
>>>>> Ubuntu
>>>>> 20.04
>>>>>     >> LTS with the latest version of the packages (see session
>>>>> info
>>>>>      below)
>>>>>     >> and 256GB RAM installed.
>>>>>     >>
>>>>>     >> When loading the raster dataset using rgdal (via readGDAL
>>>>> or
>>>>>     >> raster::readAll) I get the follwoing error in R 4.0.0:
>>>>>     >>
>>>>>     >> ```
>>>>>     >> Error in rgdal::getRasterData(con, offset = offs,
>>>>> region.dim =
>>>>>      reg,
>>>>>     >> band = object using data@band) :
>>>>>     >>   long vectors not supported yet: memory.c:3782
>>>>>     >> ```
>>>>>     >
>>>>>     > On a 16GB Fedora platform:
>>>>>     >
>>>>>     >> library(raster) # 3.1-5
>>>>>     >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>>      72000,
>>>>>     > 0,
>>>>>     > + 48000))) # all fine
>>>>>     >> rDemTest
>>>>>     > class      : RasterLayer
>>>>>     > dimensions : 48000, 72000, 3.456e+09  (nrow, ncol, ncell)
>>>>>     > resolution : 1, 1  (x, y)
>>>>>     > extent     : 0, 72000, 0, 48000  (xmin, xmax, ymin, ymax)
>>>>>     > crs        : NA
>>>>>     >
>>>>>     >> values(rDemTest) = 1
>>>>>     > Error: cannot allocate vector of size 25.7 Gb
>>>>>     >
>>>>>     > So you are deceiving yourself into thinking that all is
>>>>> fine at
>>>>>      this
>>>>>     > point. Please try to instantiate an example that can be
>>>>>      reproduced on
>>>>>     > a machine with 8GB RAM.
>>>>>     >
>>>>>     > Further note that rgdal::readGDAL() is not how you handle
>>>>> very
>>>>>      large
>>>>>     > objects in files, and never has been. raster can handle
>>>>> blocks
>>>>>      of data
>>>>>     > from bands in file; stars and gdalcubes can use proxy=TRUE
>>>>> for the
>>>>>     > same purpose. Why did you choose rgdal::readGDAL() when
>>>>> this is
>>>>> not
>>>>>     > its purpose?
>>>>>     >
>>>>>     > You did not say how much RAM is on your platform.
>>>>>     >
>>>>>     > Roger
>>>>>     >
>>>>>     >>
>>>>>     >> In R 3.6.3 is is "... memory.c:3717"
>>>>>     >>
>>>>>     >> However, I can load the same file with the tiff package
>>>>> and a
>>>>>      file of
>>>>>     >> the same size in the native raster package format (*.grd)
>>>>> with
>>>>> the
>>>>>     >> raster package but again not with the rgdal package.
>>>>>     >>
>>>>>     >> gdalinfo (gdalUtils) does not complain (see below).
>>>>> Hence, Even
>>>>>     >> Rouault assumes the problem is related to rgdal and not
>>>>> gdal
>>>>>     >> (
>>>>> https://github.com/OSGeo/gdal/issues/2442
>>>>> ).
>>>>>     >>
>>>>>     >> Below you find reproducible code, which generates a
>>>>> raster file,
>>>>>     >> saves the two formats (.tiff and .grd) and tries to read
>>>>> them
>>>>> with
>>>>>     >> the different packages.
>>>>>     >>
>>>>>     >> Is this a known limitation? Any help is greatly
>>>>> appreciated!
>>>>>     >>
>>>>>     >> Thanks a lot in advance!
>>>>>     >>
>>>>>     >> Best wishes and stay healthy,
>>>>>     >> Thorsten
>>>>>     >>
>>>>>     >>
>>>>>     >>
>>>>>     >> ### Steps to reproduce the problem.
>>>>>     >>
>>>>>     >> R code:
>>>>>     >>
>>>>>     >> ```
>>>>>     >> library(rgdal) # 1.4-8
>>>>>     >> library(raster) # 3.1-5
>>>>>     >> library(tiff) # 0.1-5
>>>>>     >>
>>>>>     >> ## generate and manipulate a big raster dataset
>>>>>     >> # - generate
>>>>>     >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>>     >> 72000, 0, 48000))) # all fine
>>>>>     >>
>>>>>     >> # - manipulate
>>>>>     >> values(rDemTest) = 1 # all fine
>>>>>     >>
>>>>>     >> # - convert
>>>>>     >> mDemTest = raster::as.matrix(rDemTest) # all fine
>>>>>     >> str(mDemTest)
>>>>>     >>
>>>>>     >> ## save a big dataset
>>>>>     >>
>>>>>     >> # - as raster/gdal
>>>>>     >> sFileNameTiff = "BigData.tif"
>>>>>     >> writeRaster(rDemTest, sFileNameTiff, "GTiff", overwrite =
>>>>> TRUE,
>>>>>     >> NAflag = -9999) # all fine
>>>>>     >>
>>>>>     >> # - as raster native
>>>>>     >> sFileNameNative = "BigData.grd"
>>>>>     >> writeRaster(rDemTest, sFileNameNative, "raster",
>>>>> overwrite =
>>>>> TRUE,
>>>>>     >> NAflag = -9999) # all fine
>>>>>     >>
>>>>>     >>
>>>>>     >> ## load the big raster datasets with different packages
>>>>> and
>>>>> options
>>>>>     >> # - load the tiff data with the gdal package via the
>>>>> raster
>>>>> package
>>>>>     >> rDem = raster(sFileNameTiff) # all fine
>>>>>     >> extent(rDem) # all fine
>>>>>     >> mDem = raster::as.matrix(rDem) # error
>>>>>     >> rDem = readAll(rDem) # error
>>>>>     >>
>>>>>     >> # - load the native raster data with the raster package
>>>>>     >> rDem = raster(sFileNameNative) # all fine
>>>>>     >> extent(rDem) # all fine
>>>>>     >> mDem = raster::as.matrix(rDem) # all fine
>>>>>     >> str(mDem)
>>>>>     >>
>>>>>     >> # - load the tiff data with the tiff package
>>>>>     >> mDem = readTIFF(sFileNameTiff) # all fine
>>>>>     >> str(mDem)
>>>>>     >>
>>>>>     >> # - load the tiff data with the gdal package
>>>>>     >> sfDem = readGDAL(sFileNameTiff) # error
>>>>>     >>
>>>>>     >> # - load the native raster data with the gdal package
>>>>>     >> sfDem = readGDAL(sFileNameNative) # error
>>>>>     >>
>>>>>     >> ```
>>>>>     >>
>>>>>     >>
>>>>>     >> ### Startup messages when rgdal is attached (requested by
>>>>> Roger
>>>>>      Bivand)
>>>>>     >>>  library(rgdal)
>>>>>     >> rgdal: version: 1.4-8, (SVN revision 845)
>>>>>     >>  Geospatial Data Abstraction Library extensions to R
>>>>>      successfully loaded
>>>>>     >>  Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
>>>>>     >>  Path to GDAL shared files:
>>>>>     >>  GDAL binary built with GEOS: TRUE
>>>>>     >>  Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th, 2020,
>>>>>      [PJ_VERSION:
>>>>>     >> 631]
>>>>>     >>  Path to PROJ.4 shared files: (autodetected)
>>>>>     >>  Linking to sp version: 1.4-1
>>>>>     >>
>>>>>     >>
>>>>>     >> ### Session info
>>>>>     >>>  sessionInfo()
>>>>>     >> R version 4.0.0 (2020-04-24)
>>>>>     >> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>     >> Running under: Ubuntu 20.04 LTS
>>>>>     >>
>>>>>     >> Matrix products: default
>>>>>     >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-
>>>>> pthread/libblas.so.3
>>>>>     >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-
>>>>> pthread/liblapack.so.3
>>>>>     >>
>>>>>     >> locale:
>>>>>     >>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
>>>>> LC_TIME=de_DE.UTF-8
>>>>>     >>  [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
>>>>>     >> LC_MESSAGES=de_DE.UTF-8
>>>>>     >>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C LC_ADDRESS=C
>>>>>     >> [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8
>>>>>     >> LC_IDENTIFICATION=C
>>>>>     >>
>>>>>     >> attached base packages:
>>>>>     >> [1] stats     graphics  grDevices utils datasets methods
>>>>> base
>>>>>     >>
>>>>>     >> other attached packages:
>>>>>     >> [1] gdalUtils_2.0.3.2 rgdal_1.4-8       tiff_0.1-5
>>>>> raster_3.1-5
>>>>>     >> sp_1.4-1
>>>>>     >>
>>>>>     >> loaded via a namespace (and not attached):
>>>>>     >>  [1] compiler_4.0.0    tools_4.0.0 Rcpp_1.0.4.6
>>>>>     >> R.methodsS3_1.8.0 codetools_0.2-16
>>>>>     >>  [6] grid_4.0.0        iterators_1.0.12 foreach_1.5.0
>>>>>     >> R.utils_2.9.2     R.oo_1.23.0
>>>>>     >> [11] lattice_0.20-41
>>>>>     >>
>>>>>     >>
>>>>>     >> ### gdalInfo
>>>>>     >>>  gdalinfo(sFileNameTiff)
>>>>>     >>  [1] "Driver: GTiff/GeoTIFF"
>>>>>     >>  [2] "Files: BigData.tif"
>>>>>     >>  [3] "Size is 72000, 48000"
>>>>>     >>  [4] "Origin = (0.000000000000000,48000.000000000000000)"
>>>>>     >>  [5] "Pixel Size = (1.000000000000000,-
>>>>> 1.000000000000000)"
>>>>>     >>  [6] "Image Structure Metadata:"
>>>>>     >>  [7] "  COMPRESSION=LZW"
>>>>>     >>  [8] "  INTERLEAVE=BAND"
>>>>>     >>  [9] "Corner Coordinates:"
>>>>>     >> [10] "Upper Left  (       0.000,   48000.000) "
>>>>>     >> [11] "Lower Left  (   0.0000000,   0.0000000) "
>>>>>     >> [12] "Upper Right (   72000.000,   48000.000) "
>>>>>     >> [13] "Lower Right (   72000.000,       0.000) "
>>>>>     >> [14] "Center      (   36000.000,   24000.000) "
>>>>>     >> [15] "Band 1 Block=72000x1 Type=Float32,
>>>>> ColorInterp=Gray"
>>>>>     >> [16] "  Min=1.000 Max=1.000 "
>>>>>     >> [17] "  Minimum=1.000, Maximum=1.000, Mean=nan,
>>>>> StdDev=nan"
>>>>>     >> [18] "  NoData Value=-9999"
>>>>>     >> [19] "  Metadata:"
>>>>>     >> [20] "    STATISTICS_MAXIMUM=1"
>>>>>     >> [21] "    STATISTICS_MEAN=nan"
>>>>>     >> [22] "    STATISTICS_MINIMUM=1"
>>>>>     >> [23] "    STATISTICS_STDDEV=nan"
>>>>>     >>
>>>>>     >> _______________________________________________
>>>>>     >> R-sig-Geo mailing list
>>>>>     >>
>>>>> R-sig-Geo using r-project.org
>>>>>   <mailto:
>>>>> R-sig-Geo using r-project.org
>>>>>     >>
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>     >>
>>>>>     >>
>>>>>     >
>>>>>
>>>>>      _______________________________________________
>>>>>      R-sig-Geo mailing list
>>>>>      
>>>>> R-sig-Geo using r-project.org
>>>>>   <mailto:
>>>>> R-sig-Geo using r-project.org
>>>>>      
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo using r-project.org
>>>>
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list