[R-sig-Geo] Issues with a GDB file in R?

Roger Bivand Roger@Biv@nd @ending from nhh@no
Sat May 5 13:02:45 CEST 2018


On Fri, 4 May 2018, Roger Bivand wrote:

> On Thu, 3 May 2018, Roger Bivand wrote:
>
>>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>
>>>   Hi again,
>>>
>>>   first of all, thanks a lot for your comments. It's becoming quite
>>>   interesting how to perform this task with such amount of data.
>>> 
>>>
>>>   Here is an update...
>>>
>>>   Cotton, I could read the geodatabase by using the commands I shared.
>>>   However, my R session crashed after overlaying it with a small set of
>>>   polygons. I guess it was because the size of the sp object (around 14.5
>>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
>>>   that
>>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
>>>   installing the sf package, but I will keep trying it.
>>>
>>>   Melanie, I already downloaded and read the geoTiff version. At first
>>>   sight, it seems that this object doesn’t have enough features as to
>>>   perform the overlying. I might have a biased idea of how raster objects
>>>   work - I'm too stick to the representation of shapefiles which sounds
>>>   quite related with Roger's idea. So I will start to explore it in order
>>>   to
>>>   gain a bit more understanding on it.
>>>
>>>   Roger, I certainly need to know which category is associated with each
>>>   CLC
>>>   polygon in order to compute the area covered by each of these classes
>>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
>>>   overlay, right?
>>
>>  On the contrary. The geoTiffs are coded as described in the documentation.
>>  Your output is simply the count of raster cells by NUTS3 for each of the
>>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
>>  a lot of sea and no data because of the French overseas territories.
>>
>>  You could possibly use PostGIS to do the intersections, or use GRASS for
>>  raster-vector counts (say through rgrass7). In R, you would want to add a
>>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
>>
>>  I would suggest using GRASS as the most obvious route, reading the raster,
>>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
>>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
>>  get 32K output categories, the 48 corine categories times the count of
>>  NUTS3 regions minus the nulls (there aren't many glaciers in most
>>  regions). You'll then need to match back the Corine codes and the NUTS3
>>  codes - see in the category label file shown by r.category.
>>
>>  I'll try to provide code tomorrow.
>
> Combining R and GRASS seems to work, but no guarantees.
>
> library(sp)
> library(rgdal)
> Gi <- GDALinfo("g250_clc12_V18_5.tif")
> makeSG <- function(x) {
>   stopifnot(class(x) == "GDALobj")
>   p4 <- attr(x, "projection")
>   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
>     c(x[2], x[1]))
>  SpatialGrid(gt, CRS(p4))
> }
> SG <- makeSG(Gi)
> nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
> nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
> library(rgrass7)
> td <- tempdir()
> iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
> # your GRASS will be where you installed it
> writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
> execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
>  flags="o")
> execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
>  label_column="FID")
> execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
> r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
>  intern=TRUE)
> r_stats1 <- gsub("\\*", "NA", r_stats0)
> con_stats <- textConnection(r_stats1)
> stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
>  "area"), colClasses=c("integer", "numeric"))
> close(con_stats)
> r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
> r_cats1 <- gsub(";", "", r_cats0)
> r_cats2 <- gsub("\t", " ", r_cats1)
> r_cats3 <- gsub("no data", "no_data", r_cats2)
> r_cats4 <- gsub("category ", "", r_cats3)
> r_cats4[1] <- paste0(r_cats4[1], "NA NA")
> r_cats_split <- strsplit(r_cats4, " ")
> cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
>   nuts=sapply(r_cats_split, "[", 2),
>   corine=as.integer(sapply(r_cats_split, "[", 3)))
> catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
> agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
>  sum)
> library(foreign)
> corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
> o <- match(colnames(agg_areas), as.character(corine_labels$Value))
> colnames(agg_areas) <- corine_labels$LABEL3[o]
> agg_areas_df <- as.data.frame(agg_areas)
> agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
>   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
>
> This should be ready to merge with the NUTS3 boundaries, if needed.
>
> agg_areas_df1$FID <- row.names(agg_areas_df1)
> nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
>

And to write out as GPKG:

names(nuts_corine)[1] <- "FID_"
writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")

It seems that at least this driver treats "FID" as a reserved field name.

> For the vector parts you could use sf and the provisional rgrass7sf on 
> github, but that wouldn't yet let you construct a skeleton SpatialGrid to 
> define the GRASS location. Using GRASS for the heavy lifting (the raster is 
> 51000 by 35000), and avoiding vector for overlay, this doesn't need much 
> memory (GRASS handles rasters by row). The GRASS temporary location only 
> takes 130MB of disk space. You could go for the 100m raster resolution, but I 
> doubt that the outcome would vary much - anyone like to try?
>
> If the sub-polygons by NUTS and corine categories are actually needed, the 
> output of r.cross could be passed to r.to.vect:
>
> execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
>   type="area")
>
> but this is more demanding in memory terms.

Reading into R is not possible, object too large.

Roger

>
> Interesting case, and it does show that combining GIS and R delivers the 
> goods - SAGA would probably work equivalently.
>
> Roger
>
>>
>>  Roger
>> 
>>>
>>>   I really appreciate any feedback in advance as well as details that I
>>>   should take into account to understand more about how to work with this
>>>   kind of data. I will also keep you up to date on how it goes if you
>>>   like.
>>> 
>>>
>>>   Best,
>>>   Roman.
>>>
>>>   On 03/05/2018, 12:26, "Roger Bivand" <Roger.Bivand at nhh.no> wrote:
>>>
>>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>>
>>>    >   Hello Shaun,
>>>    >
>>>    >   Thanks a lot for replying and providing me alternative options.
>>>    >
>>>    >
>>>    >   Unfortunately, I can't try both options anymore as I ran out of
>>>    >   ArcGIS
>>>    >   due to I was using a university PC which is not available now (I
>>>    >   installed an ArcGIS trial version there), but I'll try it later. I
>>>    >   also
>>>    >   failed on installing the sf package - I'll dig a bit more on it.
>>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
>>>    >   try
>>>    >   the second option and I'll let you know how it goes.
>>>
>>>      I think Melanie got it right - the apparent extra detail and
>>>      precision
>>>      you'd get from vector-vector overlay is illusory, so going with
>>>      geoTiff
>>>      should get you there (you can check to see whether 100m resolution
>>>      differs
>>>      from 250m). The only reason to choose vector-vector would be that the
>>>      Corine vector contains categories not represented in the raster
>>>      version.
>>>      Using raster also steps around the polygon deficiencies in the GDB
>>>      (and
>>>      probably SQLite) representations.
>>>
>>>      Roger
>>>
>>>    >
>>>    >
>>>    >   Regards,
>>>    >   Roman.
>>>    >
>>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <SWalbridge at esri.com>
>>>    >   wrote:
>>>    >
>>>    >      Hello Roman,
>>>    >
>>>    >      A couple of suggested options: You can try and use the
>>>    >      arcgisbinding package [1] to pull the data directly into R via
>>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
>>>    >      on
>>>    >      Windows or in a VM]. It will access the data directly, and let
>>>    >      you
>>>    >      create sp and sf objects out of Geodatabases with little work. A
>>>    >      basic workflow looks like this:
>>>    >
>>>    >      d <- arc.open("path/file.gdb/layer_name")
>>>    >      df <- arc.select(d) # here, you can filter columns and
>>>    >      attributes,
>>>    >      see [2]
>>>    >      # create an sf object
>>>    >      df.sf <- arc.data2sf(df)
>>>    >      # alternatively, create an sp object
>>>    >      df.sp <- arc.data2sp(df)
>>>    >
>>>    >      You can also write the results back to a geodatabase, or any
>>>    >      other
>>>    >      format that ArcGIS understands. Another option is trying the
>>>    >      SpatiaLite version of the data at the URL you posted. You should
>>>    >      be able to access this using R directly, provided your rgdal
>>>    >      installation is correctly built to read spatialite databases. If
>>>    >      it is, try the same process you mentioned using rgdal, but point
>>>    >      it at the SpatialLite database instead. You could also use
>>>    >      command-line OGR to convert the data, that has a few options
>>>    >      like
>>>    >      where clause filtering that aren't directly available via
>>>    >      readOGR.
>>>    >
>>>    >      If neither of these options work, let me know and I can convert
>>>    >      the data for you into a format of your preference.
>>>    >
>>>    >      Cheers,
>>>    >      Shaun
>>>    >
>>>    >      1.
>>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>>>    >      2.
>>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>>>    >
>>>    >
>>>    >
>>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
>>>    >      Roman"
>>>    >      <r-sig-geo-bounces at r-project.org on behalf of
>>>    >      ra454 at exeter.ac.uk>
>>>    >      wrote:
>>>    >
>>>    >          Hi everyone,
>>>    >
>>>    >          I’ve been struggling with a ESRI Geodatabase file
>>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
>>>    >          classes available on
>>>    >
>>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>>>    >
>>>    >          whose size is 2.61 GB once unzipped.
>>>    >
>>>    >          My ultimate task is to overlay it with the last NUTS3
>>>    >          administrative boundaries shapefile (2013) available on
>>>    >
>>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>>>    >
>>>    >          in order to compute the area covered by each class within
>>>    >          each
>>>    >          NUTS3 region.
>>>    >
>>>    >          Despite the ease of friendly software for performing this
>>>    >          task, haven’t been capable of doing it - GRASS didn’t load
>>>    >          the
>>>    >          file as the log reports problems with the polygons, QGIS
>>>    >          shows
>>>    >          a warning regarding a specific object and ArcGIS got frozen.
>>>    >          I
>>>    >          guess it’s because the PC I used doesn’t have enough
>>>    >          capacity.
>>>    >          Unfortunately, I don’t have access to a more powerful one.
>>>    >
>>>    >          Anyway, I decided to try with R -after all, I’ll perform my
>>>    >          analysis with it. So I started exploring this GDB with
>>>    >          rgdal:
>>>    >
>>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>>    >
>>>    >          Source:
>>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>>>    >          layer: "clc12_Version_18_5"
>>>    >          Driver: OpenFileGDB;
>>>    >          number of rows: 2370829
>>>    >          Feature type: wkbPolygon with 3 dimensions
>>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
>>>    >          Null geometry IDs: 2156240
>>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
>>>    >          +y_0=3210000
>>>    >          +ellps=GRS80 +units=m +no_defs
>>>    >          Number of fields: 6
>>>    >          name                      type  length  typeName
>>>    >          1 code_12              4       3           String
>>>    >          2 ID                         4       18         String
>>>    >          3 Remark               4       20         String
>>>    >          4 Area_Ha             2       0           Real
>>>    >          5 Shape_Length   2       0           Real
>>>    >          6 Shape_Area       2       0           Real
>>>    >
>>>    >          Then I tried to load it typing
>>>    >
>>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>>    >
>>>    >          After 3-5 minutes, it appears the following text:
>>>    >
>>>    >          OGR data source with driver: OpenFileGDB
>>>    >          Source:
>>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
>>>    >          layer:
>>>    >          "clc12_Version_18_5"
>>>    >          with 2370829 features
>>>    >          It has 6 fields
>>>    >
>>>    >          Unfortunately, after trying 5 times (each one took around 8
>>>    >          hours) I couldn’t get anything but the following messages:
>>>    >
>>>    >          Warning messages:
>>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>>    >          "clc12_Version_18_5") :
>>>    >          Dropping null geometries: 2156240
>>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>>    >          "clc12_Version_18_5") :
>>>    >          Z-dimension discarded
>>>    >
>>>    >          Could anyone advise me how to tackle it? I also would
>>>    >          appreciate suggestions on how to work with geodatabases -
>>>    >          it’s
>>>    >          my first time I work with these kind of files so I don’t
>>>    >          even
>>>    >          know their structure.
>>>    >
>>>    >          By the way, these are my computer and software
>>>    >          specifications:
>>>    >
>>>    >          MacBook Pro 2012
>>>    >          Processor 2.6 GHz Intel Core i7
>>>    >          Memory 16 GB DDR3
>>>    >          R 3.5.0
>>>    >          RStudio 1.1.447
>>>    >
>>>    >
>>>    >          Best,
>>>    >          Roman.
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >   _______________________________________________
>>>    >   R-sig-Geo mailing list
>>>    >   R-sig-Geo at r-project.org
>>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>      --
>>>      Roger Bivand
>>>      Department of Economics, Norwegian School of Economics,
>>>      Helleveien 30, N-5045 Bergen, Norway.
>>>      voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
>>>      http://orcid.org/0000-0003-2392-6140
>>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>> 
>> 
>> 
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en


More information about the R-sig-Geo mailing list