[R-sig-Geo] rgdal: problem reading a bigger raster dataset (R 4.0.0/3.6.3, Ubuntu 20.04)

Mathias Moser m@tmo@er @end|ng |rom wu@@c@@t
Tue Apr 28 16:29:52 CEST 2020


Dear all,

sorry if this is an obvious observation (or even completely wrong), but
I got interested in what's happening here: I think you are hitting the
long vector limit of R [1] and from a look at rgdal's code this happens
because it uses 32-bit types INTSXP/REALSXP in RGDAL_GetRasterData()
which are limited to 2^31-1 n. Making getRasterData() work for your
data set would probably require patching rgdal for long vector support.

(read_gdal_data() of stars _seems_ to use R_xlen_t instead to
accomodate for 52bits [2]).

Best wishes,

Mathias


[1] 
https://cloud.r-project.org/doc/manuals/r-patched/R-ints.html#Long-vectors
[2] 
https://github.com/r-spatial/sf/blob/ea1bd716769ab8140d3451e3d902cfc79bc895d5/src/stars.cpp#L172


On Tue, 2020-04-28 at 13:39 +0200, Thorsten Behrens wrote:
> Roger and Mike,
> 
> I really appreciate your help on this!
> 
> I had a look at getRasterData(). It results in the same error. Hence,
> I 
> made further tests where I compared grids with the following numbers
> of 
> cols and rows:
> 
> nPx = floor(sqrt(2^31 -1)) # 46340
> 
> and
> 
> nPx = ceiling(sqrt(2^31 -1)) # 46341
> 
> The result is clear. Raster data with 46340 * 46340 px can be loaded 
> with getRasterData() and with raster(), raster::as.matrix(), whereas 
> datasets with 46341 * 46341 px cannot and result in the error:
> 
> Error in getRasterData(gdCeil, band = 1) :
> long vectors not supported yet: memory.c:3782
> 
> read_stars() works for both. You find the corresponding code below.
> 
> Are there any other option I can try?
> 
> Thorsten
> 
> 
> Reproducible code:
> 
> ## generate raster datasets
> # 46340 * 46340 grid dataset
> sFileNameFloor = "Floor.tif"
> 
> nPx = floor(sqrt(2^31 -1)) # 46340
> nPx
> 
> rFloor = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
> nPx)))
> values(rFloor) = 1
> 
> writeRaster(rFloor, sFileNameFloor, "GTiff", overwrite = TRUE, NAflag
> = 
> -9999)
> 
> 
> # 46341 * 46341 grid dataset
> sFileNameCeil = "Ceil.tif"
> 
> nPx = ceiling(sqrt(2^31 -1))
> nPx
> 
> rCeil = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
> nPx)))
> values(rCeil) = 1
> 
> writeRaster(rCeil, sFileNameCeil, "GTiff", overwrite = TRUE, NAflag
> = 
> -9999)
> 
> 
> ## load raster datasets with different methods
> 
> # load Ceil
> gdCeil = GDAL.open(sFileNameCeil)
> dim(gdCeil)
> 
> vnCeil = getRasterData(gdCeil, band = 1) # error
> 
> GDAL.close(gdCeil)
> str(vnCeil)
> 
> stCeil = read_stars(sFileNameCeil) # all fine
> str(stCeil[[1]])
> 
> rCeil = raster(sFileNameCeil)
> mCeil = raster::as.matrix(rCeil) # error
> str(mCeil)
> 
> 
> # load Floor
> gdFloor = GDAL.open(sFileNameFloor)
> dim(gdFloor)
> 
> vnData = getRasterData(gdFloor, band = 1) # all fine
> 
> GDAL.close(gdFloor)
> str(vnData)
> 
> stFloor= read_stars(sFileNameFloor) # all fine
> str(stFloor[[1]])
> 
> rFloor = raster(sFileNameFloor)
> mFloor = raster::as.matrix(rFloor) # all fine
> str(mFloor)
> 
> 
> 
> 
> Am 28.04.2020 um 12:10 schrieb Roger Bivand:
> > On Tue, 28 Apr 2020, Thorsten Behrens wrote:
> > 
> > > Michael,
> > > 
> > > Thanks for the hint, it seems to work! Real-world tests will
> > > follow in
> > > the next few days...
> > > 
> > > So it definitely seems to be a problem of rgdal. It would be
> > > great if it
> > > could still be solved.
> > 
> > Not rgdal, but your use of it. Try looking at a sequence of
> > 
> > file <- system.file("pictures/SP27GTIF.TIF", package="rgdal")
> > obj <- GDAL.open(file)
> > (dims <- dim(obj))
> > band <- 1
> > data_vector <- getRasterData(obj, band)
> > GDAL.close(obj)
> > str(data_vector)
> > 
> > This does not create any more complicated objects, just a matrix.
> > In 
> > some cases, the rows in the matrix are ordered S -> N, so it may 
> > appear the wrong way up.
> > 
> > rgdal::getRasterData() is lightweight, and has many arguments
> > which 
> > may be helpful. rgdal::readGDAL() is heavyweight, creating a 
> > SpatialGridDataFrame object. This involves much copying of data,
> > but 
> > the output object can be used for example in mapping or analysis 
> > without further conversion. My guess is that
> > rgdal::getRasterData() 
> > gives you your matrix directly. Look at the R code to see how
> > as.is= 
> > etc. work (files may include scale and offset values - recently a
> > user 
> > was confused that scale and offset were "magically" applied to
> > convert 
> > Uint16 values to degrees Kelvin on reading). For example, if as.is
> > == 
> > TRUE or scale == 1 and offset == 0, no copying of the input matrix 
> > occurs because it is not converted. If you could check this route, 
> > others following this thread could also benefit; if I'm wrong,
> > that 
> > would also be good to know.
> > 
> > Roger
> > 
> > 
> > > Best,
> > > 
> > > Thorsten
> > > 
> > > 
> > > 
> > > Am 27.04.2020 um 15:58 schrieb Michael Sumner:
> > > > Try stars it worked for me on a test
> > > > 
> > > > On Mon., 27 Apr. 2020, 23:54 Thorsten Behrens,
> > > > <
> > > > thorsten.m.behrens using gmail.com
> > > >  <mailto:
> > > > thorsten.m.behrens using gmail.com
> > > > >>
> > > > wrote:
> > > > 
> > > >     Roger,
> > > > 
> > > >     thanks a lot for your reply!
> > > > 
> > > >     I have 256GB RAM installed (mentioned it somewhere). And
> > > > there,
> > > >     all is
> > > >     fine when I run:
> > > > 
> > > >     rDemTest = raster(nrow = 48000, ncol = 72000, ext =
> > > > extent(c(0, 
> > > > 72000,
> > > > 
> > > >     values(rDemTest) = 1
> > > > 
> > > >     When limiting the memory to about 8GB with
> > > >     ulimit::memory_limit(8000),
> > > >     the max size which can be allocated seems to be around
> > > > 10000 x
> > > >     10000px.
> > > >     In this case all tests run fine. Unfortunately it seems to
> > > > be
> > > >     related to
> > > >     the size of the grid (48000 x 72000) and therefore the
> > > > problem
> > > >     can't be
> > > >     reproduced on machines with 8GB RAM. For some processing
> > > > steps I 
> > > > need
> > > >     grids of that size in the memory, which is why I have 256
> > > > GB
> > > >     installed.
> > > > 
> > > >     Normally, I use the raster package and not
> > > > rgdal::readGDAL(). This
> > > >     was
> > > >     just a desperate attempt to find the source of the problem.
> > > > 
> > > >     This is what I use in my code:
> > > > 
> > > >     rDem = raster(sFileNameTiff)
> > > >     mDem = raster::as.matrix(rDem)
> > > > 
> > > >     But maybe this is the same...
> > > > 
> > > >     Any further suggestions are much appreciated!
> > > > 
> > > >     Thanks again!
> > > > 
> > > >     Best,
> > > > 
> > > >     Thorsten
> > > > 
> > > > 
> > > > 
> > > > 
> > > >     Am 27.04.2020 um 14:50 schrieb Roger Bivand:
> > > >    > On Mon, 27 Apr 2020, Thorsten Behrens wrote:
> > > >    >
> > > >    >> Dear all,
> > > >    >>
> > > >    >> my problem is that I want to read a big geotiff raster
> > > > dataset
> > > >     into R
> > > >    >> and convert it to a matrix, which does not work.
> > > >    >> The file is big but there is sufficient memory. I need
> > > > all the
> > > >     data
> > > >    >> in the memory at the same time.
> > > >    >>
> > > >    >> The error occurs under R 3.6.3 as well as 4.0.0 using
> > > > Ubuntu 
> > > > 20.04
> > > >    >> LTS with the latest version of the packages (see session
> > > > info
> > > >     below)
> > > >    >> and 256GB RAM installed.
> > > >    >>
> > > >    >> When loading the raster dataset using rgdal (via readGDAL
> > > > or
> > > >    >> raster::readAll) I get the follwoing error in R 4.0.0:
> > > >    >>
> > > >    >> ```
> > > >    >> Error in rgdal::getRasterData(con, offset = offs,
> > > > region.dim =
> > > >     reg,
> > > >    >> band = object using data@band) :
> > > >    >>   long vectors not supported yet: memory.c:3782
> > > >    >> ```
> > > >    >
> > > >    > On a 16GB Fedora platform:
> > > >    >
> > > >    >> library(raster) # 3.1-5
> > > >    >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
> > > > extent(c(0,
> > > >     72000,
> > > >    > 0,
> > > >    > + 48000))) # all fine
> > > >    >> rDemTest
> > > >    > class      : RasterLayer
> > > >    > dimensions : 48000, 72000, 3.456e+09  (nrow, ncol, ncell)
> > > >    > resolution : 1, 1  (x, y)
> > > >    > extent     : 0, 72000, 0, 48000  (xmin, xmax, ymin, ymax)
> > > >    > crs        : NA
> > > >    >
> > > >    >> values(rDemTest) = 1
> > > >    > Error: cannot allocate vector of size 25.7 Gb
> > > >    >
> > > >    > So you are deceiving yourself into thinking that all is
> > > > fine at
> > > >     this
> > > >    > point. Please try to instantiate an example that can be
> > > >     reproduced on
> > > >    > a machine with 8GB RAM.
> > > >    >
> > > >    > Further note that rgdal::readGDAL() is not how you handle
> > > > very
> > > >     large
> > > >    > objects in files, and never has been. raster can handle
> > > > blocks
> > > >     of data
> > > >    > from bands in file; stars and gdalcubes can use proxy=TRUE
> > > > for the
> > > >    > same purpose. Why did you choose rgdal::readGDAL() when
> > > > this is 
> > > > not
> > > >    > its purpose?
> > > >    >
> > > >    > You did not say how much RAM is on your platform.
> > > >    >
> > > >    > Roger
> > > >    >
> > > >    >>
> > > >    >> In R 3.6.3 is is "... memory.c:3717"
> > > >    >>
> > > >    >> However, I can load the same file with the tiff package
> > > > and a
> > > >     file of
> > > >    >> the same size in the native raster package format (*.grd)
> > > > with 
> > > > the
> > > >    >> raster package but again not with the rgdal package.
> > > >    >>
> > > >    >> gdalinfo (gdalUtils) does not complain (see below).
> > > > Hence, Even
> > > >    >> Rouault assumes the problem is related to rgdal and not
> > > > gdal
> > > >    >> (
> > > > https://github.com/OSGeo/gdal/issues/2442
> > > > ).
> > > >    >>
> > > >    >> Below you find reproducible code, which generates a
> > > > raster file,
> > > >    >> saves the two formats (.tiff and .grd) and tries to read
> > > > them 
> > > > with
> > > >    >> the different packages.
> > > >    >>
> > > >    >> Is this a known limitation? Any help is greatly
> > > > appreciated!
> > > >    >>
> > > >    >> Thanks a lot in advance!
> > > >    >>
> > > >    >> Best wishes and stay healthy,
> > > >    >> Thorsten
> > > >    >>
> > > >    >>
> > > >    >>
> > > >    >> ### Steps to reproduce the problem.
> > > >    >>
> > > >    >> R code:
> > > >    >>
> > > >    >> ```
> > > >    >> library(rgdal) # 1.4-8
> > > >    >> library(raster) # 3.1-5
> > > >    >> library(tiff) # 0.1-5
> > > >    >>
> > > >    >> ## generate and manipulate a big raster dataset
> > > >    >> # - generate
> > > >    >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
> > > > extent(c(0,
> > > >    >> 72000, 0, 48000))) # all fine
> > > >    >>
> > > >    >> # - manipulate
> > > >    >> values(rDemTest) = 1 # all fine
> > > >    >>
> > > >    >> # - convert
> > > >    >> mDemTest = raster::as.matrix(rDemTest) # all fine
> > > >    >> str(mDemTest)
> > > >    >>
> > > >    >> ## save a big dataset
> > > >    >>
> > > >    >> # - as raster/gdal
> > > >    >> sFileNameTiff = "BigData.tif"
> > > >    >> writeRaster(rDemTest, sFileNameTiff, "GTiff", overwrite =
> > > > TRUE,
> > > >    >> NAflag = -9999) # all fine
> > > >    >>
> > > >    >> # - as raster native
> > > >    >> sFileNameNative = "BigData.grd"
> > > >    >> writeRaster(rDemTest, sFileNameNative, "raster",
> > > > overwrite = 
> > > > TRUE,
> > > >    >> NAflag = -9999) # all fine
> > > >    >>
> > > >    >>
> > > >    >> ## load the big raster datasets with different packages
> > > > and 
> > > > options
> > > >    >> # - load the tiff data with the gdal package via the
> > > > raster 
> > > > package
> > > >    >> rDem = raster(sFileNameTiff) # all fine
> > > >    >> extent(rDem) # all fine
> > > >    >> mDem = raster::as.matrix(rDem) # error
> > > >    >> rDem = readAll(rDem) # error
> > > >    >>
> > > >    >> # - load the native raster data with the raster package
> > > >    >> rDem = raster(sFileNameNative) # all fine
> > > >    >> extent(rDem) # all fine
> > > >    >> mDem = raster::as.matrix(rDem) # all fine
> > > >    >> str(mDem)
> > > >    >>
> > > >    >> # - load the tiff data with the tiff package
> > > >    >> mDem = readTIFF(sFileNameTiff) # all fine
> > > >    >> str(mDem)
> > > >    >>
> > > >    >> # - load the tiff data with the gdal package
> > > >    >> sfDem = readGDAL(sFileNameTiff) # error
> > > >    >>
> > > >    >> # - load the native raster data with the gdal package
> > > >    >> sfDem = readGDAL(sFileNameNative) # error
> > > >    >>
> > > >    >> ```
> > > >    >>
> > > >    >>
> > > >    >> ### Startup messages when rgdal is attached (requested by
> > > > Roger
> > > >     Bivand)
> > > >    >>>  library(rgdal)
> > > >    >> rgdal: version: 1.4-8, (SVN revision 845)
> > > >    >>  Geospatial Data Abstraction Library extensions to R
> > > >     successfully loaded
> > > >    >>  Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
> > > >    >>  Path to GDAL shared files:
> > > >    >>  GDAL binary built with GEOS: TRUE
> > > >    >>  Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th, 2020,
> > > >     [PJ_VERSION:
> > > >    >> 631]
> > > >    >>  Path to PROJ.4 shared files: (autodetected)
> > > >    >>  Linking to sp version: 1.4-1
> > > >    >>
> > > >    >>
> > > >    >> ### Session info
> > > >    >>>  sessionInfo()
> > > >    >> R version 4.0.0 (2020-04-24)
> > > >    >> Platform: x86_64-pc-linux-gnu (64-bit)
> > > >    >> Running under: Ubuntu 20.04 LTS
> > > >    >>
> > > >    >> Matrix products: default
> > > >    >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-
> > > > pthread/libblas.so.3
> > > >    >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-
> > > > pthread/liblapack.so.3
> > > >    >>
> > > >    >> locale:
> > > >    >>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
> > > > LC_TIME=de_DE.UTF-8
> > > >    >>  [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
> > > >    >> LC_MESSAGES=de_DE.UTF-8
> > > >    >>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C LC_ADDRESS=C
> > > >    >> [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8
> > > >    >> LC_IDENTIFICATION=C
> > > >    >>
> > > >    >> attached base packages:
> > > >    >> [1] stats     graphics  grDevices utils datasets methods
> > > > base
> > > >    >>
> > > >    >> other attached packages:
> > > >    >> [1] gdalUtils_2.0.3.2 rgdal_1.4-8       tiff_0.1-5
> > > > raster_3.1-5
> > > >    >> sp_1.4-1
> > > >    >>
> > > >    >> loaded via a namespace (and not attached):
> > > >    >>  [1] compiler_4.0.0    tools_4.0.0 Rcpp_1.0.4.6
> > > >    >> R.methodsS3_1.8.0 codetools_0.2-16
> > > >    >>  [6] grid_4.0.0        iterators_1.0.12 foreach_1.5.0
> > > >    >> R.utils_2.9.2     R.oo_1.23.0
> > > >    >> [11] lattice_0.20-41
> > > >    >>
> > > >    >>
> > > >    >> ### gdalInfo
> > > >    >>>  gdalinfo(sFileNameTiff)
> > > >    >>  [1] "Driver: GTiff/GeoTIFF"
> > > >    >>  [2] "Files: BigData.tif"
> > > >    >>  [3] "Size is 72000, 48000"
> > > >    >>  [4] "Origin = (0.000000000000000,48000.000000000000000)"
> > > >    >>  [5] "Pixel Size = (1.000000000000000,-
> > > > 1.000000000000000)"
> > > >    >>  [6] "Image Structure Metadata:"
> > > >    >>  [7] "  COMPRESSION=LZW"
> > > >    >>  [8] "  INTERLEAVE=BAND"
> > > >    >>  [9] "Corner Coordinates:"
> > > >    >> [10] "Upper Left  (       0.000,   48000.000) "
> > > >    >> [11] "Lower Left  (   0.0000000,   0.0000000) "
> > > >    >> [12] "Upper Right (   72000.000,   48000.000) "
> > > >    >> [13] "Lower Right (   72000.000,       0.000) "
> > > >    >> [14] "Center      (   36000.000,   24000.000) "
> > > >    >> [15] "Band 1 Block=72000x1 Type=Float32,
> > > > ColorInterp=Gray"
> > > >    >> [16] "  Min=1.000 Max=1.000 "
> > > >    >> [17] "  Minimum=1.000, Maximum=1.000, Mean=nan,
> > > > StdDev=nan"
> > > >    >> [18] "  NoData Value=-9999"
> > > >    >> [19] "  Metadata:"
> > > >    >> [20] "    STATISTICS_MAXIMUM=1"
> > > >    >> [21] "    STATISTICS_MEAN=nan"
> > > >    >> [22] "    STATISTICS_MINIMUM=1"
> > > >    >> [23] "    STATISTICS_STDDEV=nan"
> > > >    >>
> > > >    >> _______________________________________________
> > > >    >> R-sig-Geo mailing list
> > > >    >> 
> > > > R-sig-Geo using r-project.org
> > > >  <mailto:
> > > > R-sig-Geo using r-project.org
> > > > >
> > > >    >> 
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > > 
> > > >    >>
> > > >    >>
> > > >    >
> > > > 
> > > >     _______________________________________________
> > > >     R-sig-Geo mailing list
> > > >     
> > > > R-sig-Geo using r-project.org
> > > >  <mailto:
> > > > R-sig-Geo using r-project.org
> > > > >
> > > >     
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > > 
> > > > 
> > > 
> > >     [[alternative HTML version deleted]]
> > > 
> > > _______________________________________________
> > > R-sig-Geo mailing list
> > > R-sig-Geo using r-project.org
> > > 
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > 
> > > 
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> 
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list