[R-sig-Geo] rgdal: problem reading a bigger raster dataset (R 4.0.0/3.6.3, Ubuntu 20.04)
Mathias Moser
m@tmo@er @end|ng |rom wu@@c@@t
Tue Apr 28 22:14:18 CEST 2020
> I cannot check this because I have no access to a platform with
> enough
> RAM, so I need help here. I haven't been able to confirm that
> LENGTH(sRStorage) returns an R_xlen_t, or double, in lines 1672, 1685
> or
> 1693. Is there a way of generating a large file using a GDAL app,
> perhaps,
> then I could just try reading a larger file if LENGTH plays up?
LENGTH() can not handle the R_xlen_t and again throws a long vector
error. I have replaced the three occurrences with XLENGTH() and tested
using Thorsten's Ceil.tif file, looks good so far:
> > library(rgdal)
> Loading required package: sp
> rgdal: version: 1.5-8, (SVN revision (unknown))
> Geospatial Data Abstraction Library extensions to R successfully
> loaded
> Loaded GDAL runtime: GDAL 3.0.1, released 2019/06/28
> Path to GDAL shared files: /usr/local/share/gdal
> GDAL binary built with GEOS: TRUE
> Loaded PROJ runtime: Rel. 6.2.0, September 1st, 2019, [PJ_VERSION:
> 620]
> Path to PROJ shared files: /usr/local/share/proj
> Linking to sp version: 1.4-1
> > mCeil = raster::as.matrix(raster::raster("Ceil.tif"))
> > str(mCeil)
> num [1:46341, 1:46341] 1 1 1 1 1 1 1 1 1 1 ...
HTH, best wishes,
Mathias
>
> > (read_gdal_data() of stars _seems_ to use R_xlen_t instead to
> > accomodate for 52bits [2]).
> >
> > Best wishes,
> >
> > Mathias
> >
> >
> > [1]
> > https://cloud.r-project.org/doc/manuals/r-patched/R-ints.html#Long-vectors
> >
> > [2]
> > https://github.com/r-spatial/sf/blob/ea1bd716769ab8140d3451e3d902cfc79bc895d5/src/stars.cpp#L172
> >
> >
> >
> > On Tue, 2020-04-28 at 13:39 +0200, Thorsten Behrens wrote:
> > > Roger and Mike,
> > >
> > > I really appreciate your help on this!
> > >
> > > I had a look at getRasterData(). It results in the same error.
> > > Hence,
> > > I
> > > made further tests where I compared grids with the following
> > > numbers
> > > of
> > > cols and rows:
> > >
> > > nPx = floor(sqrt(2^31 -1)) # 46340
> > >
> > > and
> > >
> > > nPx = ceiling(sqrt(2^31 -1)) # 46341
> > >
> > > The result is clear. Raster data with 46340 * 46340 px can be
> > > loaded
> > > with getRasterData() and with raster(), raster::as.matrix(),
> > > whereas
> > > datasets with 46341 * 46341 px cannot and result in the error:
> > >
> > > Error in getRasterData(gdCeil, band = 1) :
> > > long vectors not supported yet: memory.c:3782
> > >
> > > read_stars() works for both. You find the corresponding code
> > > below.
> > >
> > > Are there any other option I can try?
> > >
> > > Thorsten
> > >
> > >
> > > Reproducible code:
> > >
> > > ## generate raster datasets
> > > # 46340 * 46340 grid dataset
> > > sFileNameFloor = "Floor.tif"
> > >
> > > nPx = floor(sqrt(2^31 -1)) # 46340
> > > nPx
> > >
> > > rFloor = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
> > > nPx)))
> > > values(rFloor) = 1
> > >
> > > writeRaster(rFloor, sFileNameFloor, "GTiff", overwrite = TRUE,
> > > NAflag
> > > =
> > > -9999)
> > >
> > >
> > > # 46341 * 46341 grid dataset
> > > sFileNameCeil = "Ceil.tif"
> > >
> > > nPx = ceiling(sqrt(2^31 -1))
> > > nPx
> > >
> > > rCeil = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
> > > nPx)))
> > > values(rCeil) = 1
> > >
> > > writeRaster(rCeil, sFileNameCeil, "GTiff", overwrite = TRUE,
> > > NAflag
> > > =
> > > -9999)
> > >
> > >
> > > ## load raster datasets with different methods
> > >
> > > # load Ceil
> > > gdCeil = GDAL.open(sFileNameCeil)
> > > dim(gdCeil)
> > >
> > > vnCeil = getRasterData(gdCeil, band = 1) # error
> > >
> > > GDAL.close(gdCeil)
> > > str(vnCeil)
> > >
> > > stCeil = read_stars(sFileNameCeil) # all fine
> > > str(stCeil[[1]])
> > >
> > > rCeil = raster(sFileNameCeil)
> > > mCeil = raster::as.matrix(rCeil) # error
> > > str(mCeil)
> > >
> > >
> > > # load Floor
> > > gdFloor = GDAL.open(sFileNameFloor)
> > > dim(gdFloor)
> > >
> > > vnData = getRasterData(gdFloor, band = 1) # all fine
> > >
> > > GDAL.close(gdFloor)
> > > str(vnData)
> > >
> > > stFloor= read_stars(sFileNameFloor) # all fine
> > > str(stFloor[[1]])
> > >
> > > rFloor = raster(sFileNameFloor)
> > > mFloor = raster::as.matrix(rFloor) # all fine
> > > str(mFloor)
> > >
> > >
> > >
> > >
> > > Am 28.04.2020 um 12:10 schrieb Roger Bivand:
> > > > On Tue, 28 Apr 2020, Thorsten Behrens wrote:
> > > >
> > > > > Michael,
> > > > >
> > > > > Thanks for the hint, it seems to work! Real-world tests will
> > > > > follow in
> > > > > the next few days...
> > > > >
> > > > > So it definitely seems to be a problem of rgdal. It would be
> > > > > great if it
> > > > > could still be solved.
> > > >
> > > > Not rgdal, but your use of it. Try looking at a sequence of
> > > >
> > > > file <- system.file("pictures/SP27GTIF.TIF", package="rgdal")
> > > > obj <- GDAL.open(file)
> > > > (dims <- dim(obj))
> > > > band <- 1
> > > > data_vector <- getRasterData(obj, band)
> > > > GDAL.close(obj)
> > > > str(data_vector)
> > > >
> > > > This does not create any more complicated objects, just a
> > > > matrix.
> > > > In
> > > > some cases, the rows in the matrix are ordered S -> N, so it
> > > > may
> > > > appear the wrong way up.
> > > >
> > > > rgdal::getRasterData() is lightweight, and has many arguments
> > > > which
> > > > may be helpful. rgdal::readGDAL() is heavyweight, creating a
> > > > SpatialGridDataFrame object. This involves much copying of
> > > > data,
> > > > but
> > > > the output object can be used for example in mapping or
> > > > analysis
> > > > without further conversion. My guess is that
> > > > rgdal::getRasterData()
> > > > gives you your matrix directly. Look at the R code to see how
> > > > as.is=
> > > > etc. work (files may include scale and offset values - recently
> > > > a
> > > > user
> > > > was confused that scale and offset were "magically" applied to
> > > > convert
> > > > Uint16 values to degrees Kelvin on reading). For example, if
> > > > as.is
> > > > ==
> > > > TRUE or scale == 1 and offset == 0, no copying of the input
> > > > matrix
> > > > occurs because it is not converted. If you could check this
> > > > route,
> > > > others following this thread could also benefit; if I'm wrong,
> > > > that
> > > > would also be good to know.
> > > >
> > > > Roger
> > > >
> > > >
> > > > > Best,
> > > > >
> > > > > Thorsten
> > > > >
> > > > >
> > > > >
> > > > > Am 27.04.2020 um 15:58 schrieb Michael Sumner:
> > > > > > Try stars it worked for me on a test
> > > > > >
> > > > > > On Mon., 27 Apr. 2020, 23:54 Thorsten Behrens,
> > > > > > <
> > > > > > thorsten.m.behrens using gmail.com
> > > > > >
> > > > > > <mailto:
> > > > > > thorsten.m.behrens using gmail.com
> > > > > >
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > Roger,
> > > > > >
> > > > > > thanks a lot for your reply!
> > > > > >
> > > > > > I have 256GB RAM installed (mentioned it somewhere).
> > > > > > And
> > > > > > there,
> > > > > > all is
> > > > > > fine when I run:
> > > > > >
> > > > > > rDemTest = raster(nrow = 48000, ncol = 72000, ext =
> > > > > > extent(c(0,
> > > > > > 72000,
> > > > > >
> > > > > > values(rDemTest) = 1
> > > > > >
> > > > > > When limiting the memory to about 8GB with
> > > > > > ulimit::memory_limit(8000),
> > > > > > the max size which can be allocated seems to be around
> > > > > > 10000 x
> > > > > > 10000px.
> > > > > > In this case all tests run fine. Unfortunately it seems
> > > > > > to
> > > > > > be
> > > > > > related to
> > > > > > the size of the grid (48000 x 72000) and therefore the
> > > > > > problem
> > > > > > can't be
> > > > > > reproduced on machines with 8GB RAM. For some
> > > > > > processing
> > > > > > steps I
> > > > > > need
> > > > > > grids of that size in the memory, which is why I have
> > > > > > 256
> > > > > > GB
> > > > > > installed.
> > > > > >
> > > > > > Normally, I use the raster package and not
> > > > > > rgdal::readGDAL(). This
> > > > > > was
> > > > > > just a desperate attempt to find the source of the
> > > > > > problem.
> > > > > >
> > > > > > This is what I use in my code:
> > > > > >
> > > > > > rDem = raster(sFileNameTiff)
> > > > > > mDem = raster::as.matrix(rDem)
> > > > > >
> > > > > > But maybe this is the same...
> > > > > >
> > > > > > Any further suggestions are much appreciated!
> > > > > >
> > > > > > Thanks again!
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Thorsten
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Am 27.04.2020 um 14:50 schrieb Roger Bivand:
> > > > > > > On Mon, 27 Apr 2020, Thorsten Behrens wrote:
> > > > > > >
> > > > > > >> Dear all,
> > > > > > >>
> > > > > > >> my problem is that I want to read a big geotiff raster
> > > > > > dataset
> > > > > > into R
> > > > > > >> and convert it to a matrix, which does not work.
> > > > > > >> The file is big but there is sufficient memory. I need
> > > > > > all the
> > > > > > data
> > > > > > >> in the memory at the same time.
> > > > > > >>
> > > > > > >> The error occurs under R 3.6.3 as well as 4.0.0 using
> > > > > > Ubuntu
> > > > > > 20.04
> > > > > > >> LTS with the latest version of the packages (see
> > > > > > session
> > > > > > info
> > > > > > below)
> > > > > > >> and 256GB RAM installed.
> > > > > > >>
> > > > > > >> When loading the raster dataset using rgdal (via
> > > > > > readGDAL
> > > > > > or
> > > > > > >> raster::readAll) I get the follwoing error in R 4.0.0:
> > > > > > >>
> > > > > > >> ```
> > > > > > >> Error in rgdal::getRasterData(con, offset = offs,
> > > > > > region.dim =
> > > > > > reg,
> > > > > > >> band = object using data@band) :
> > > > > > >> long vectors not supported yet: memory.c:3782
> > > > > > >> ```
> > > > > > >
> > > > > > > On a 16GB Fedora platform:
> > > > > > >
> > > > > > >> library(raster) # 3.1-5
> > > > > > >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
> > > > > > extent(c(0,
> > > > > > 72000,
> > > > > > > 0,
> > > > > > > + 48000))) # all fine
> > > > > > >> rDemTest
> > > > > > > class : RasterLayer
> > > > > > > dimensions : 48000, 72000, 3.456e+09 (nrow, ncol,
> > > > > > ncell)
> > > > > > > resolution : 1, 1 (x, y)
> > > > > > > extent : 0, 72000, 0, 48000 (xmin, xmax, ymin,
> > > > > > ymax)
> > > > > > > crs : NA
> > > > > > >
> > > > > > >> values(rDemTest) = 1
> > > > > > > Error: cannot allocate vector of size 25.7 Gb
> > > > > > >
> > > > > > > So you are deceiving yourself into thinking that all is
> > > > > > fine at
> > > > > > this
> > > > > > > point. Please try to instantiate an example that can be
> > > > > > reproduced on
> > > > > > > a machine with 8GB RAM.
> > > > > > >
> > > > > > > Further note that rgdal::readGDAL() is not how you
> > > > > > handle
> > > > > > very
> > > > > > large
> > > > > > > objects in files, and never has been. raster can handle
> > > > > > blocks
> > > > > > of data
> > > > > > > from bands in file; stars and gdalcubes can use
> > > > > > proxy=TRUE
> > > > > > for the
> > > > > > > same purpose. Why did you choose rgdal::readGDAL() when
> > > > > > this is
> > > > > > not
> > > > > > > its purpose?
> > > > > > >
> > > > > > > You did not say how much RAM is on your platform.
> > > > > > >
> > > > > > > Roger
> > > > > > >
> > > > > > >>
> > > > > > >> In R 3.6.3 is is "... memory.c:3717"
> > > > > > >>
> > > > > > >> However, I can load the same file with the tiff
> > > > > > package
> > > > > > and a
> > > > > > file of
> > > > > > >> the same size in the native raster package format
> > > > > > (*.grd)
> > > > > > with
> > > > > > the
> > > > > > >> raster package but again not with the rgdal package.
> > > > > > >>
> > > > > > >> gdalinfo (gdalUtils) does not complain (see below).
> > > > > > Hence, Even
> > > > > > >> Rouault assumes the problem is related to rgdal and
> > > > > > not
> > > > > > gdal
> > > > > > >> (
> > > > > > https://github.com/OSGeo/gdal/issues/2442
> > > > > >
> > > > > > ).
> > > > > > >>
> > > > > > >> Below you find reproducible code, which generates a
> > > > > > raster file,
> > > > > > >> saves the two formats (.tiff and .grd) and tries to
> > > > > > read
> > > > > > them
> > > > > > with
> > > > > > >> the different packages.
> > > > > > >>
> > > > > > >> Is this a known limitation? Any help is greatly
> > > > > > appreciated!
> > > > > > >>
> > > > > > >> Thanks a lot in advance!
> > > > > > >>
> > > > > > >> Best wishes and stay healthy,
> > > > > > >> Thorsten
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> ### Steps to reproduce the problem.
> > > > > > >>
> > > > > > >> R code:
> > > > > > >>
> > > > > > >> ```
> > > > > > >> library(rgdal) # 1.4-8
> > > > > > >> library(raster) # 3.1-5
> > > > > > >> library(tiff) # 0.1-5
> > > > > > >>
> > > > > > >> ## generate and manipulate a big raster dataset
> > > > > > >> # - generate
> > > > > > >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
> > > > > > extent(c(0,
> > > > > > >> 72000, 0, 48000))) # all fine
> > > > > > >>
> > > > > > >> # - manipulate
> > > > > > >> values(rDemTest) = 1 # all fine
> > > > > > >>
> > > > > > >> # - convert
> > > > > > >> mDemTest = raster::as.matrix(rDemTest) # all fine
> > > > > > >> str(mDemTest)
> > > > > > >>
> > > > > > >> ## save a big dataset
> > > > > > >>
> > > > > > >> # - as raster/gdal
> > > > > > >> sFileNameTiff = "BigData.tif"
> > > > > > >> writeRaster(rDemTest, sFileNameTiff, "GTiff",
> > > > > > overwrite =
> > > > > > TRUE,
> > > > > > >> NAflag = -9999) # all fine
> > > > > > >>
> > > > > > >> # - as raster native
> > > > > > >> sFileNameNative = "BigData.grd"
> > > > > > >> writeRaster(rDemTest, sFileNameNative, "raster",
> > > > > > overwrite =
> > > > > > TRUE,
> > > > > > >> NAflag = -9999) # all fine
> > > > > > >>
> > > > > > >>
> > > > > > >> ## load the big raster datasets with different
> > > > > > packages
> > > > > > and
> > > > > > options
> > > > > > >> # - load the tiff data with the gdal package via the
> > > > > > raster
> > > > > > package
> > > > > > >> rDem = raster(sFileNameTiff) # all fine
> > > > > > >> extent(rDem) # all fine
> > > > > > >> mDem = raster::as.matrix(rDem) # error
> > > > > > >> rDem = readAll(rDem) # error
> > > > > > >>
> > > > > > >> # - load the native raster data with the raster
> > > > > > package
> > > > > > >> rDem = raster(sFileNameNative) # all fine
> > > > > > >> extent(rDem) # all fine
> > > > > > >> mDem = raster::as.matrix(rDem) # all fine
> > > > > > >> str(mDem)
> > > > > > >>
> > > > > > >> # - load the tiff data with the tiff package
> > > > > > >> mDem = readTIFF(sFileNameTiff) # all fine
> > > > > > >> str(mDem)
> > > > > > >>
> > > > > > >> # - load the tiff data with the gdal package
> > > > > > >> sfDem = readGDAL(sFileNameTiff) # error
> > > > > > >>
> > > > > > >> # - load the native raster data with the gdal package
> > > > > > >> sfDem = readGDAL(sFileNameNative) # error
> > > > > > >>
> > > > > > >> ```
> > > > > > >>
> > > > > > >>
> > > > > > >> ### Startup messages when rgdal is attached (requested
> > > > > > by
> > > > > > Roger
> > > > > > Bivand)
> > > > > > >>> library(rgdal)
> > > > > > >> rgdal: version: 1.4-8, (SVN revision 845)
> > > > > > >> Geospatial Data Abstraction Library extensions to R
> > > > > > successfully loaded
> > > > > > >> Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
> > > > > > >> Path to GDAL shared files:
> > > > > > >> GDAL binary built with GEOS: TRUE
> > > > > > >> Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th,
> > > > > > 2020,
> > > > > > [PJ_VERSION:
> > > > > > >> 631]
> > > > > > >> Path to PROJ.4 shared files: (autodetected)
> > > > > > >> Linking to sp version: 1.4-1
> > > > > > >>
> > > > > > >>
> > > > > > >> ### Session info
> > > > > > >>> sessionInfo()
> > > > > > >> R version 4.0.0 (2020-04-24)
> > > > > > >> Platform: x86_64-pc-linux-gnu (64-bit)
> > > > > > >> Running under: Ubuntu 20.04 LTS
> > > > > > >>
> > > > > > >> Matrix products: default
> > > > > > >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-
> > > > > > pthread/libblas.so.3
> > > > > > >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-
> > > > > > pthread/liblapack.so.3
> > > > > > >>
> > > > > > >> locale:
> > > > > > >> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
> > > > > > LC_TIME=de_DE.UTF-8
> > > > > > >> [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
> > > > > > >> LC_MESSAGES=de_DE.UTF-8
> > > > > > >> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C
> > > > > > >> [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8
> > > > > > >> LC_IDENTIFICATION=C
> > > > > > >>
> > > > > > >> attached base packages:
> > > > > > >> [1] stats graphics grDevices utils datasets
> > > > > > methods
> > > > > > base
> > > > > > >>
> > > > > > >> other attached packages:
> > > > > > >> [1] gdalUtils_2.0.3.2 rgdal_1.4-8 tiff_0.1-5
> > > > > > raster_3.1-5
> > > > > > >> sp_1.4-1
> > > > > > >>
> > > > > > >> loaded via a namespace (and not attached):
> > > > > > >> [1] compiler_4.0.0 tools_4.0.0 Rcpp_1.0.4.6
> > > > > > >> R.methodsS3_1.8.0 codetools_0.2-16
> > > > > > >> [6] grid_4.0.0 iterators_1.0.12 foreach_1.5.0
> > > > > > >> R.utils_2.9.2 R.oo_1.23.0
> > > > > > >> [11] lattice_0.20-41
> > > > > > >>
> > > > > > >>
> > > > > > >> ### gdalInfo
> > > > > > >>> gdalinfo(sFileNameTiff)
> > > > > > >> [1] "Driver: GTiff/GeoTIFF"
> > > > > > >> [2] "Files: BigData.tif"
> > > > > > >> [3] "Size is 72000, 48000"
> > > > > > >> [4] "Origin =
> > > > > > (0.000000000000000,48000.000000000000000)"
> > > > > > >> [5] "Pixel Size = (1.000000000000000,-
> > > > > > 1.000000000000000)"
> > > > > > >> [6] "Image Structure Metadata:"
> > > > > > >> [7] " COMPRESSION=LZW"
> > > > > > >> [8] " INTERLEAVE=BAND"
> > > > > > >> [9] "Corner Coordinates:"
> > > > > > >> [10] "Upper Left ( 0.000, 48000.000) "
> > > > > > >> [11] "Lower Left ( 0.0000000, 0.0000000) "
> > > > > > >> [12] "Upper Right ( 72000.000, 48000.000) "
> > > > > > >> [13] "Lower Right ( 72000.000, 0.000) "
> > > > > > >> [14] "Center ( 36000.000, 24000.000) "
> > > > > > >> [15] "Band 1 Block=72000x1 Type=Float32,
> > > > > > ColorInterp=Gray"
> > > > > > >> [16] " Min=1.000 Max=1.000 "
> > > > > > >> [17] " Minimum=1.000, Maximum=1.000, Mean=nan,
> > > > > > StdDev=nan"
> > > > > > >> [18] " NoData Value=-9999"
> > > > > > >> [19] " Metadata:"
> > > > > > >> [20] " STATISTICS_MAXIMUM=1"
> > > > > > >> [21] " STATISTICS_MEAN=nan"
> > > > > > >> [22] " STATISTICS_MINIMUM=1"
> > > > > > >> [23] " STATISTICS_STDDEV=nan"
> > > > > > >>
> > > > > > >> _______________________________________________
> > > > > > >> R-sig-Geo mailing list
> > > > > > >>
> > > > > > R-sig-Geo using r-project.org
> > > > > >
> > > > > > <mailto:
> > > > > > R-sig-Geo using r-project.org
> > > > > >
> > > > > >
> > > > > > >>
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > > > >
> > > > > >
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > R-sig-Geo mailing list
> > > > > >
> > > > > > R-sig-Geo using r-project.org
> > > > > >
> > > > > > <mailto:
> > > > > > R-sig-Geo using r-project.org
> > > > > >
> > > > > >
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > [[alternative HTML version deleted]]
> > > > >
> > > > > _______________________________________________
> > > > > R-sig-Geo mailing list
> > > > > R-sig-Geo using r-project.org
> > > > >
> > > > >
> > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > > >
> > > > >
> > > > >
> > >
> > > _______________________________________________
> > > R-sig-Geo mailing list
> > > R-sig-Geo using r-project.org
> > >
> > >
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > >
> > >
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo using r-project.org
> >
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
More information about the R-sig-Geo
mailing list