[R-sig-Geo] rgdal: problem reading a bigger raster dataset (R 4.0.0/3.6.3, Ubuntu 20.04)

Roger Bivand Roger@B|v@nd @end|ng |rom nhh@no
Tue Apr 28 19:41:10 CEST 2020


On Tue, 28 Apr 2020, Mathias Moser wrote:

> Dear all,
>
> sorry if this is an obvious observation (or even completely wrong), but
> I got interested in what's happening here: I think you are hitting the
> long vector limit of R [1] and from a look at rgdal's code this happens
> because it uses 32-bit types INTSXP/REALSXP in RGDAL_GetRasterData()
> which are limited to 2^31-1 n. Making getRasterData() work for your
> data set would probably require patching rgdal for long vector support.

This looks probable, I'd got to the same point, as now I can max out swap 
on assigning values. Changes to lines 1574 and 1664 in 
src/gdal-bindings.cpp committed to SVN, revision 961, will be installable 
with

https://r-forge.r-project.org/R/?group_id=884 and if status green,
install.packages("rgdal", repos="http://R-Forge.R-project.org")

in about 2 hours (R-Forge builds tarballs, slowly), or source tree now by

svn checkout svn://scm.r-forge.r-project.org/svnroot/rgdal/

I cannot check this because I have no access to a platform with enough 
RAM, so I need help here. I haven't been able to confirm that 
LENGTH(sRStorage) returns an R_xlen_t, or double, in lines 1672, 1685 or 
1693. Is there a way of generating a large file using a GDAL app, perhaps, 
then I could just try reading a larger file if LENGTH plays up?

Roger

>
> (read_gdal_data() of stars _seems_ to use R_xlen_t instead to
> accomodate for 52bits [2]).
>
> Best wishes,
>
> Mathias
>
>
> [1]
> https://cloud.r-project.org/doc/manuals/r-patched/R-ints.html#Long-vectors
> [2]
> https://github.com/r-spatial/sf/blob/ea1bd716769ab8140d3451e3d902cfc79bc895d5/src/stars.cpp#L172
>
>
> On Tue, 2020-04-28 at 13:39 +0200, Thorsten Behrens wrote:
>> Roger and Mike,
>>
>> I really appreciate your help on this!
>>
>> I had a look at getRasterData(). It results in the same error. Hence,
>> I
>> made further tests where I compared grids with the following numbers
>> of
>> cols and rows:
>>
>> nPx = floor(sqrt(2^31 -1)) # 46340
>>
>> and
>>
>> nPx = ceiling(sqrt(2^31 -1)) # 46341
>>
>> The result is clear. Raster data with 46340 * 46340 px can be loaded
>> with getRasterData() and with raster(), raster::as.matrix(), whereas
>> datasets with 46341 * 46341 px cannot and result in the error:
>>
>> Error in getRasterData(gdCeil, band = 1) :
>> long vectors not supported yet: memory.c:3782
>>
>> read_stars() works for both. You find the corresponding code below.
>>
>> Are there any other option I can try?
>>
>> Thorsten
>>
>>
>> Reproducible code:
>>
>> ## generate raster datasets
>> # 46340 * 46340 grid dataset
>> sFileNameFloor = "Floor.tif"
>>
>> nPx = floor(sqrt(2^31 -1)) # 46340
>> nPx
>>
>> rFloor = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
>> nPx)))
>> values(rFloor) = 1
>>
>> writeRaster(rFloor, sFileNameFloor, "GTiff", overwrite = TRUE, NAflag
>> =
>> -9999)
>>
>>
>> # 46341 * 46341 grid dataset
>> sFileNameCeil = "Ceil.tif"
>>
>> nPx = ceiling(sqrt(2^31 -1))
>> nPx
>>
>> rCeil = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
>> nPx)))
>> values(rCeil) = 1
>>
>> writeRaster(rCeil, sFileNameCeil, "GTiff", overwrite = TRUE, NAflag
>> =
>> -9999)
>>
>>
>> ## load raster datasets with different methods
>>
>> # load Ceil
>> gdCeil = GDAL.open(sFileNameCeil)
>> dim(gdCeil)
>>
>> vnCeil = getRasterData(gdCeil, band = 1) # error
>>
>> GDAL.close(gdCeil)
>> str(vnCeil)
>>
>> stCeil = read_stars(sFileNameCeil) # all fine
>> str(stCeil[[1]])
>>
>> rCeil = raster(sFileNameCeil)
>> mCeil = raster::as.matrix(rCeil) # error
>> str(mCeil)
>>
>>
>> # load Floor
>> gdFloor = GDAL.open(sFileNameFloor)
>> dim(gdFloor)
>>
>> vnData = getRasterData(gdFloor, band = 1) # all fine
>>
>> GDAL.close(gdFloor)
>> str(vnData)
>>
>> stFloor= read_stars(sFileNameFloor) # all fine
>> str(stFloor[[1]])
>>
>> rFloor = raster(sFileNameFloor)
>> mFloor = raster::as.matrix(rFloor) # all fine
>> str(mFloor)
>>
>>
>>
>>
>> Am 28.04.2020 um 12:10 schrieb Roger Bivand:
>>> On Tue, 28 Apr 2020, Thorsten Behrens wrote:
>>>
>>>> Michael,
>>>>
>>>> Thanks for the hint, it seems to work! Real-world tests will
>>>> follow in
>>>> the next few days...
>>>>
>>>> So it definitely seems to be a problem of rgdal. It would be
>>>> great if it
>>>> could still be solved.
>>>
>>> Not rgdal, but your use of it. Try looking at a sequence of
>>>
>>> file <- system.file("pictures/SP27GTIF.TIF", package="rgdal")
>>> obj <- GDAL.open(file)
>>> (dims <- dim(obj))
>>> band <- 1
>>> data_vector <- getRasterData(obj, band)
>>> GDAL.close(obj)
>>> str(data_vector)
>>>
>>> This does not create any more complicated objects, just a matrix.
>>> In
>>> some cases, the rows in the matrix are ordered S -> N, so it may
>>> appear the wrong way up.
>>>
>>> rgdal::getRasterData() is lightweight, and has many arguments
>>> which
>>> may be helpful. rgdal::readGDAL() is heavyweight, creating a
>>> SpatialGridDataFrame object. This involves much copying of data,
>>> but
>>> the output object can be used for example in mapping or analysis
>>> without further conversion. My guess is that
>>> rgdal::getRasterData()
>>> gives you your matrix directly. Look at the R code to see how
>>> as.is=
>>> etc. work (files may include scale and offset values - recently a
>>> user
>>> was confused that scale and offset were "magically" applied to
>>> convert
>>> Uint16 values to degrees Kelvin on reading). For example, if as.is
>>> ==
>>> TRUE or scale == 1 and offset == 0, no copying of the input matrix
>>> occurs because it is not converted. If you could check this route,
>>> others following this thread could also benefit; if I'm wrong,
>>> that
>>> would also be good to know.
>>>
>>> Roger
>>>
>>>
>>>> Best,
>>>>
>>>> Thorsten
>>>>
>>>>
>>>>
>>>> Am 27.04.2020 um 15:58 schrieb Michael Sumner:
>>>>> Try stars it worked for me on a test
>>>>>
>>>>> On Mon., 27 Apr. 2020, 23:54 Thorsten Behrens,
>>>>> <
>>>>> thorsten.m.behrens using gmail.com
>>>>>  <mailto:
>>>>> thorsten.m.behrens using gmail.com
>>>>>>>
>>>>> wrote:
>>>>>
>>>>>     Roger,
>>>>>
>>>>>     thanks a lot for your reply!
>>>>>
>>>>>     I have 256GB RAM installed (mentioned it somewhere). And
>>>>> there,
>>>>>     all is
>>>>>     fine when I run:
>>>>>
>>>>>     rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>> 72000,
>>>>>
>>>>>     values(rDemTest) = 1
>>>>>
>>>>>     When limiting the memory to about 8GB with
>>>>>     ulimit::memory_limit(8000),
>>>>>     the max size which can be allocated seems to be around
>>>>> 10000 x
>>>>>     10000px.
>>>>>     In this case all tests run fine. Unfortunately it seems to
>>>>> be
>>>>>     related to
>>>>>     the size of the grid (48000 x 72000) and therefore the
>>>>> problem
>>>>>     can't be
>>>>>     reproduced on machines with 8GB RAM. For some processing
>>>>> steps I
>>>>> need
>>>>>     grids of that size in the memory, which is why I have 256
>>>>> GB
>>>>>     installed.
>>>>>
>>>>>     Normally, I use the raster package and not
>>>>> rgdal::readGDAL(). This
>>>>>     was
>>>>>     just a desperate attempt to find the source of the problem.
>>>>>
>>>>>     This is what I use in my code:
>>>>>
>>>>>     rDem = raster(sFileNameTiff)
>>>>>     mDem = raster::as.matrix(rDem)
>>>>>
>>>>>     But maybe this is the same...
>>>>>
>>>>>     Any further suggestions are much appreciated!
>>>>>
>>>>>     Thanks again!
>>>>>
>>>>>     Best,
>>>>>
>>>>>     Thorsten
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     Am 27.04.2020 um 14:50 schrieb Roger Bivand:
>>>>>   > On Mon, 27 Apr 2020, Thorsten Behrens wrote:
>>>>>   >
>>>>>   >> Dear all,
>>>>>   >>
>>>>>   >> my problem is that I want to read a big geotiff raster
>>>>> dataset
>>>>>     into R
>>>>>   >> and convert it to a matrix, which does not work.
>>>>>   >> The file is big but there is sufficient memory. I need
>>>>> all the
>>>>>     data
>>>>>   >> in the memory at the same time.
>>>>>   >>
>>>>>   >> The error occurs under R 3.6.3 as well as 4.0.0 using
>>>>> Ubuntu
>>>>> 20.04
>>>>>   >> LTS with the latest version of the packages (see session
>>>>> info
>>>>>     below)
>>>>>   >> and 256GB RAM installed.
>>>>>   >>
>>>>>   >> When loading the raster dataset using rgdal (via readGDAL
>>>>> or
>>>>>   >> raster::readAll) I get the follwoing error in R 4.0.0:
>>>>>   >>
>>>>>   >> ```
>>>>>   >> Error in rgdal::getRasterData(con, offset = offs,
>>>>> region.dim =
>>>>>     reg,
>>>>>   >> band = object using data@band) :
>>>>>   >>   long vectors not supported yet: memory.c:3782
>>>>>   >> ```
>>>>>   >
>>>>>   > On a 16GB Fedora platform:
>>>>>   >
>>>>>   >> library(raster) # 3.1-5
>>>>>   >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>>     72000,
>>>>>   > 0,
>>>>>   > + 48000))) # all fine
>>>>>   >> rDemTest
>>>>>   > class      : RasterLayer
>>>>>   > dimensions : 48000, 72000, 3.456e+09  (nrow, ncol, ncell)
>>>>>   > resolution : 1, 1  (x, y)
>>>>>   > extent     : 0, 72000, 0, 48000  (xmin, xmax, ymin, ymax)
>>>>>   > crs        : NA
>>>>>   >
>>>>>   >> values(rDemTest) = 1
>>>>>   > Error: cannot allocate vector of size 25.7 Gb
>>>>>   >
>>>>>   > So you are deceiving yourself into thinking that all is
>>>>> fine at
>>>>>     this
>>>>>   > point. Please try to instantiate an example that can be
>>>>>     reproduced on
>>>>>   > a machine with 8GB RAM.
>>>>>   >
>>>>>   > Further note that rgdal::readGDAL() is not how you handle
>>>>> very
>>>>>     large
>>>>>   > objects in files, and never has been. raster can handle
>>>>> blocks
>>>>>     of data
>>>>>   > from bands in file; stars and gdalcubes can use proxy=TRUE
>>>>> for the
>>>>>   > same purpose. Why did you choose rgdal::readGDAL() when
>>>>> this is
>>>>> not
>>>>>   > its purpose?
>>>>>   >
>>>>>   > You did not say how much RAM is on your platform.
>>>>>   >
>>>>>   > Roger
>>>>>   >
>>>>>   >>
>>>>>   >> In R 3.6.3 is is "... memory.c:3717"
>>>>>   >>
>>>>>   >> However, I can load the same file with the tiff package
>>>>> and a
>>>>>     file of
>>>>>   >> the same size in the native raster package format (*.grd)
>>>>> with
>>>>> the
>>>>>   >> raster package but again not with the rgdal package.
>>>>>   >>
>>>>>   >> gdalinfo (gdalUtils) does not complain (see below).
>>>>> Hence, Even
>>>>>   >> Rouault assumes the problem is related to rgdal and not
>>>>> gdal
>>>>>   >> (
>>>>> https://github.com/OSGeo/gdal/issues/2442
>>>>> ).
>>>>>   >>
>>>>>   >> Below you find reproducible code, which generates a
>>>>> raster file,
>>>>>   >> saves the two formats (.tiff and .grd) and tries to read
>>>>> them
>>>>> with
>>>>>   >> the different packages.
>>>>>   >>
>>>>>   >> Is this a known limitation? Any help is greatly
>>>>> appreciated!
>>>>>   >>
>>>>>   >> Thanks a lot in advance!
>>>>>   >>
>>>>>   >> Best wishes and stay healthy,
>>>>>   >> Thorsten
>>>>>   >>
>>>>>   >>
>>>>>   >>
>>>>>   >> ### Steps to reproduce the problem.
>>>>>   >>
>>>>>   >> R code:
>>>>>   >>
>>>>>   >> ```
>>>>>   >> library(rgdal) # 1.4-8
>>>>>   >> library(raster) # 3.1-5
>>>>>   >> library(tiff) # 0.1-5
>>>>>   >>
>>>>>   >> ## generate and manipulate a big raster dataset
>>>>>   >> # - generate
>>>>>   >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
>>>>> extent(c(0,
>>>>>   >> 72000, 0, 48000))) # all fine
>>>>>   >>
>>>>>   >> # - manipulate
>>>>>   >> values(rDemTest) = 1 # all fine
>>>>>   >>
>>>>>   >> # - convert
>>>>>   >> mDemTest = raster::as.matrix(rDemTest) # all fine
>>>>>   >> str(mDemTest)
>>>>>   >>
>>>>>   >> ## save a big dataset
>>>>>   >>
>>>>>   >> # - as raster/gdal
>>>>>   >> sFileNameTiff = "BigData.tif"
>>>>>   >> writeRaster(rDemTest, sFileNameTiff, "GTiff", overwrite =
>>>>> TRUE,
>>>>>   >> NAflag = -9999) # all fine
>>>>>   >>
>>>>>   >> # - as raster native
>>>>>   >> sFileNameNative = "BigData.grd"
>>>>>   >> writeRaster(rDemTest, sFileNameNative, "raster",
>>>>> overwrite =
>>>>> TRUE,
>>>>>   >> NAflag = -9999) # all fine
>>>>>   >>
>>>>>   >>
>>>>>   >> ## load the big raster datasets with different packages
>>>>> and
>>>>> options
>>>>>   >> # - load the tiff data with the gdal package via the
>>>>> raster
>>>>> package
>>>>>   >> rDem = raster(sFileNameTiff) # all fine
>>>>>   >> extent(rDem) # all fine
>>>>>   >> mDem = raster::as.matrix(rDem) # error
>>>>>   >> rDem = readAll(rDem) # error
>>>>>   >>
>>>>>   >> # - load the native raster data with the raster package
>>>>>   >> rDem = raster(sFileNameNative) # all fine
>>>>>   >> extent(rDem) # all fine
>>>>>   >> mDem = raster::as.matrix(rDem) # all fine
>>>>>   >> str(mDem)
>>>>>   >>
>>>>>   >> # - load the tiff data with the tiff package
>>>>>   >> mDem = readTIFF(sFileNameTiff) # all fine
>>>>>   >> str(mDem)
>>>>>   >>
>>>>>   >> # - load the tiff data with the gdal package
>>>>>   >> sfDem = readGDAL(sFileNameTiff) # error
>>>>>   >>
>>>>>   >> # - load the native raster data with the gdal package
>>>>>   >> sfDem = readGDAL(sFileNameNative) # error
>>>>>   >>
>>>>>   >> ```
>>>>>   >>
>>>>>   >>
>>>>>   >> ### Startup messages when rgdal is attached (requested by
>>>>> Roger
>>>>>     Bivand)
>>>>>   >>>  library(rgdal)
>>>>>   >> rgdal: version: 1.4-8, (SVN revision 845)
>>>>>   >>  Geospatial Data Abstraction Library extensions to R
>>>>>     successfully loaded
>>>>>   >>  Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
>>>>>   >>  Path to GDAL shared files:
>>>>>   >>  GDAL binary built with GEOS: TRUE
>>>>>   >>  Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th, 2020,
>>>>>     [PJ_VERSION:
>>>>>   >> 631]
>>>>>   >>  Path to PROJ.4 shared files: (autodetected)
>>>>>   >>  Linking to sp version: 1.4-1
>>>>>   >>
>>>>>   >>
>>>>>   >> ### Session info
>>>>>   >>>  sessionInfo()
>>>>>   >> R version 4.0.0 (2020-04-24)
>>>>>   >> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>   >> Running under: Ubuntu 20.04 LTS
>>>>>   >>
>>>>>   >> Matrix products: default
>>>>>   >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-
>>>>> pthread/libblas.so.3
>>>>>   >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-
>>>>> pthread/liblapack.so.3
>>>>>   >>
>>>>>   >> locale:
>>>>>   >>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
>>>>> LC_TIME=de_DE.UTF-8
>>>>>   >>  [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
>>>>>   >> LC_MESSAGES=de_DE.UTF-8
>>>>>   >>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C LC_ADDRESS=C
>>>>>   >> [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8
>>>>>   >> LC_IDENTIFICATION=C
>>>>>   >>
>>>>>   >> attached base packages:
>>>>>   >> [1] stats     graphics  grDevices utils datasets methods
>>>>> base
>>>>>   >>
>>>>>   >> other attached packages:
>>>>>   >> [1] gdalUtils_2.0.3.2 rgdal_1.4-8       tiff_0.1-5
>>>>> raster_3.1-5
>>>>>   >> sp_1.4-1
>>>>>   >>
>>>>>   >> loaded via a namespace (and not attached):
>>>>>   >>  [1] compiler_4.0.0    tools_4.0.0 Rcpp_1.0.4.6
>>>>>   >> R.methodsS3_1.8.0 codetools_0.2-16
>>>>>   >>  [6] grid_4.0.0        iterators_1.0.12 foreach_1.5.0
>>>>>   >> R.utils_2.9.2     R.oo_1.23.0
>>>>>   >> [11] lattice_0.20-41
>>>>>   >>
>>>>>   >>
>>>>>   >> ### gdalInfo
>>>>>   >>>  gdalinfo(sFileNameTiff)
>>>>>   >>  [1] "Driver: GTiff/GeoTIFF"
>>>>>   >>  [2] "Files: BigData.tif"
>>>>>   >>  [3] "Size is 72000, 48000"
>>>>>   >>  [4] "Origin = (0.000000000000000,48000.000000000000000)"
>>>>>   >>  [5] "Pixel Size = (1.000000000000000,-
>>>>> 1.000000000000000)"
>>>>>   >>  [6] "Image Structure Metadata:"
>>>>>   >>  [7] "  COMPRESSION=LZW"
>>>>>   >>  [8] "  INTERLEAVE=BAND"
>>>>>   >>  [9] "Corner Coordinates:"
>>>>>   >> [10] "Upper Left  (       0.000,   48000.000) "
>>>>>   >> [11] "Lower Left  (   0.0000000,   0.0000000) "
>>>>>   >> [12] "Upper Right (   72000.000,   48000.000) "
>>>>>   >> [13] "Lower Right (   72000.000,       0.000) "
>>>>>   >> [14] "Center      (   36000.000,   24000.000) "
>>>>>   >> [15] "Band 1 Block=72000x1 Type=Float32,
>>>>> ColorInterp=Gray"
>>>>>   >> [16] "  Min=1.000 Max=1.000 "
>>>>>   >> [17] "  Minimum=1.000, Maximum=1.000, Mean=nan,
>>>>> StdDev=nan"
>>>>>   >> [18] "  NoData Value=-9999"
>>>>>   >> [19] "  Metadata:"
>>>>>   >> [20] "    STATISTICS_MAXIMUM=1"
>>>>>   >> [21] "    STATISTICS_MEAN=nan"
>>>>>   >> [22] "    STATISTICS_MINIMUM=1"
>>>>>   >> [23] "    STATISTICS_STDDEV=nan"
>>>>>   >>
>>>>>   >> _______________________________________________
>>>>>   >> R-sig-Geo mailing list
>>>>>   >>
>>>>> R-sig-Geo using r-project.org
>>>>>  <mailto:
>>>>> R-sig-Geo using r-project.org
>>>>>>
>>>>>   >>
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>   >>
>>>>>   >>
>>>>>   >
>>>>>
>>>>>     _______________________________________________
>>>>>     R-sig-Geo mailing list
>>>>>
>>>>> R-sig-Geo using r-project.org
>>>>>  <mailto:
>>>>> R-sig-Geo using r-project.org
>>>>>>
>>>>>
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo using r-project.org
>>>>
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list