[R-sig-Geo] Issues with a GDB file in R?

Aguirre Perez, Roman r@454 @ending from exeter@@c@uk
Sun May 6 15:16:24 CEST 2018


On 05/05/2018, 12:03, "Roger Bivand" <Roger.Bivand at nhh.no> wrote:

    On Fri, 4 May 2018, Roger Bivand wrote:
    
    > On Thu, 3 May 2018, Roger Bivand wrote:
    >
    >>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
    >>
    >>>   Hi again,
    >>>
    >>>   first of all, thanks a lot for your comments. It's becoming quite
    >>>   interesting how to perform this task with such amount of data.
    >>> 
    >>>
    >>>   Here is an update...
    >>>
    >>>   Cotton, I could read the geodatabase by using the commands I shared.
    >>>   However, my R session crashed after overlaying it with a small set of
    >>>   polygons. I guess it was because the size of the sp object (around 14.5
    >>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
    >>>   that
    >>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
    >>>   installing the sf package, but I will keep trying it.
    >>>
    >>>   Melanie, I already downloaded and read the geoTiff version. At first
    >>>   sight, it seems that this object doesn’t have enough features as to
    >>>   perform the overlying. I might have a biased idea of how raster objects
    >>>   work - I'm too stick to the representation of shapefiles which sounds
    >>>   quite related with Roger's idea. So I will start to explore it in order
    >>>   to
    >>>   gain a bit more understanding on it.
    >>>
    >>>   Roger, I certainly need to know which category is associated with each
    >>>   CLC
    >>>   polygon in order to compute the area covered by each of these classes
    >>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
    >>>   overlay, right?
    >>
    >>  On the contrary. The geoTiffs are coded as described in the documentation.
    >>  Your output is simply the count of raster cells by NUTS3 for each of the
    >>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
    >>  a lot of sea and no data because of the French overseas territories.
    >>
    >>  You could possibly use PostGIS to do the intersections, or use GRASS for
    >>  raster-vector counts (say through rgrass7). In R, you would want to add a
    >>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
    >>
    >>  I would suggest using GRASS as the most obvious route, reading the raster,
    >>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
    >>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
    >>  get 32K output categories, the 48 corine categories times the count of
    >>  NUTS3 regions minus the nulls (there aren't many glaciers in most
    >>  regions). You'll then need to match back the Corine codes and the NUTS3
    >>  codes - see in the category label file shown by r.category.
    >>
    >>  I'll try to provide code tomorrow.
    >
    > Combining R and GRASS seems to work, but no guarantees.
    >
    > library(sp)
    > library(rgdal)
    > Gi <- GDALinfo("g250_clc12_V18_5.tif")
    > makeSG <- function(x) {
    >   stopifnot(class(x) == "GDALobj")
    >   p4 <- attr(x, "projection")
    >   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
    >     c(x[2], x[1]))
    >  SpatialGrid(gt, CRS(p4))
    > }
    > SG <- makeSG(Gi)
    > nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
    > nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
    > library(rgrass7)
    > td <- tempdir()
    > iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
    > # your GRASS will be where you installed it
    > writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
    > execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
    >  flags="o")
    > execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
    >  label_column="FID")
    > execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
    > r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
    >  intern=TRUE)
    > r_stats1 <- gsub("\\*", "NA", r_stats0)
    > con_stats <- textConnection(r_stats1)
    > stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
    >  "area"), colClasses=c("integer", "numeric"))
    > close(con_stats)
    > r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
    > r_cats1 <- gsub(";", "", r_cats0)
    > r_cats2 <- gsub("\t", " ", r_cats1)
    > r_cats3 <- gsub("no data", "no_data", r_cats2)
    > r_cats4 <- gsub("category ", "", r_cats3)
    > r_cats4[1] <- paste0(r_cats4[1], "NA NA")
    > r_cats_split <- strsplit(r_cats4, " ")
    > cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
    >   nuts=sapply(r_cats_split, "[", 2),
    >   corine=as.integer(sapply(r_cats_split, "[", 3)))
    > catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
    > agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
    >  sum)
    > library(foreign)
    > corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
    > o <- match(colnames(agg_areas), as.character(corine_labels$Value))
    > colnames(agg_areas) <- corine_labels$LABEL3[o]
    > agg_areas_df <- as.data.frame(agg_areas)
    > agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
    >   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
    >
    > This should be ready to merge with the NUTS3 boundaries, if needed.
    >
    > agg_areas_df1$FID <- row.names(agg_areas_df1)
    > nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
    >
    
    And to write out as GPKG:
    
    names(nuts_corine)[1] <- "FID_"
    writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")
    
    It seems that at least this driver treats "FID" as a reserved field name.
    
    > For the vector parts you could use sf and the provisional rgrass7sf on 
    > github, but that wouldn't yet let you construct a skeleton SpatialGrid to 
    > define the GRASS location. Using GRASS for the heavy lifting (the raster is 
    > 51000 by 35000), and avoiding vector for overlay, this doesn't need much 
    > memory (GRASS handles rasters by row). The GRASS temporary location only 
    > takes 130MB of disk space. You could go for the 100m raster resolution, but I 
    > doubt that the outcome would vary much - anyone like to try?
    >
    > If the sub-polygons by NUTS and corine categories are actually needed, the 
    > output of r.cross could be passed to r.to.vect:
    >
    > execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
    >   type="area")
    >
    > but this is more demanding in memory terms.

Hi Roger,

thanks a lot again for helping me with this issue.


Unfortunately, I haven't managed to call GRASS within R, neither install sf. My guess is that those issues are because I'm using both newest versions OS X High Sierra and R 3.5.0 which are not stable yet.   

    Reading into R is not possible, object too large.

Bearing in mind the previous issues plus this last comment, I decided to use a PC with another OS (Ubuntu) aiming at solving them and working with a bit more space (32 GB).  
    
    Roger

In the meantime, I really would appreciate you guidance for gaining background on this subject. Could you please suggest me resources to better understand the features of objects involved in Spatial Statistics? I should mention that the knowledge I have about it is resumed in your book Applied Spatial Data Analysis with R.


Best,
Roman.    

    >
    > Interesting case, and it does show that combining GIS and R delivers the 
    > goods - SAGA would probably work equivalently.
    >
    > Roger
    >
    >>
    >>  Roger
    >> 
    >>>
    >>>   I really appreciate any feedback in advance as well as details that I
    >>>   should take into account to understand more about how to work with this
    >>>   kind of data. I will also keep you up to date on how it goes if you
    >>>   like.
    >>> 
    >>>
    >>>   Best,
    >>>   Roman.
    >>>
    >>>   On 03/05/2018, 12:26, "Roger Bivand" <Roger.Bivand at nhh.no> wrote:
    >>>
    >>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
    >>>
    >>>    >   Hello Shaun,
    >>>    >
    >>>    >   Thanks a lot for replying and providing me alternative options.
    >>>    >
    >>>    >
    >>>    >   Unfortunately, I can't try both options anymore as I ran out of
    >>>    >   ArcGIS
    >>>    >   due to I was using a university PC which is not available now (I
    >>>    >   installed an ArcGIS trial version there), but I'll try it later. I
    >>>    >   also
    >>>    >   failed on installing the sf package - I'll dig a bit more on it.
    >>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
    >>>    >   try
    >>>    >   the second option and I'll let you know how it goes.
    >>>
    >>>      I think Melanie got it right - the apparent extra detail and
    >>>      precision
    >>>      you'd get from vector-vector overlay is illusory, so going with
    >>>      geoTiff
    >>>      should get you there (you can check to see whether 100m resolution
    >>>      differs
    >>>      from 250m). The only reason to choose vector-vector would be that the
    >>>      Corine vector contains categories not represented in the raster
    >>>      version.
    >>>      Using raster also steps around the polygon deficiencies in the GDB
    >>>      (and
    >>>      probably SQLite) representations.
    >>>
    >>>      Roger
    >>>
    >>>    >
    >>>    >
    >>>    >   Regards,
    >>>    >   Roman.
    >>>    >
    >>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <SWalbridge at esri.com>
    >>>    >   wrote:
    >>>    >
    >>>    >      Hello Roman,
    >>>    >
    >>>    >      A couple of suggested options: You can try and use the
    >>>    >      arcgisbinding package [1] to pull the data directly into R via
    >>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
    >>>    >      on
    >>>    >      Windows or in a VM]. It will access the data directly, and let
    >>>    >      you
    >>>    >      create sp and sf objects out of Geodatabases with little work. A
    >>>    >      basic workflow looks like this:
    >>>    >
    >>>    >      d <- arc.open("path/file.gdb/layer_name")
    >>>    >      df <- arc.select(d) # here, you can filter columns and
    >>>    >      attributes,
    >>>    >      see [2]
    >>>    >      # create an sf object
    >>>    >      df.sf <- arc.data2sf(df)
    >>>    >      # alternatively, create an sp object
    >>>    >      df.sp <- arc.data2sp(df)
    >>>    >
    >>>    >      You can also write the results back to a geodatabase, or any
    >>>    >      other
    >>>    >      format that ArcGIS understands. Another option is trying the
    >>>    >      SpatiaLite version of the data at the URL you posted. You should
    >>>    >      be able to access this using R directly, provided your rgdal
    >>>    >      installation is correctly built to read spatialite databases. If
    >>>    >      it is, try the same process you mentioned using rgdal, but point
    >>>    >      it at the SpatialLite database instead. You could also use
    >>>    >      command-line OGR to convert the data, that has a few options
    >>>    >      like
    >>>    >      where clause filtering that aren't directly available via
    >>>    >      readOGR.
    >>>    >
    >>>    >      If neither of these options work, let me know and I can convert
    >>>    >      the data for you into a format of your preference.
    >>>    >
    >>>    >      Cheers,
    >>>    >      Shaun
    >>>    >
    >>>    >      1.
    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
    >>>    >      2.
    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
    >>>    >
    >>>    >
    >>>    >
    >>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
    >>>    >      Roman"
    >>>    >      <r-sig-geo-bounces at r-project.org on behalf of
    >>>    >      ra454 at exeter.ac.uk>
    >>>    >      wrote:
    >>>    >
    >>>    >          Hi everyone,
    >>>    >
    >>>    >          I’ve been struggling with a ESRI Geodatabase file
    >>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
    >>>    >          classes available on
    >>>    >
    >>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
    >>>    >
    >>>    >          whose size is 2.61 GB once unzipped.
    >>>    >
    >>>    >          My ultimate task is to overlay it with the last NUTS3
    >>>    >          administrative boundaries shapefile (2013) available on
    >>>    >
    >>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
    >>>    >
    >>>    >          in order to compute the area covered by each class within
    >>>    >          each
    >>>    >          NUTS3 region.
    >>>    >
    >>>    >          Despite the ease of friendly software for performing this
    >>>    >          task, haven’t been capable of doing it - GRASS didn’t load
    >>>    >          the
    >>>    >          file as the log reports problems with the polygons, QGIS
    >>>    >          shows
    >>>    >          a warning regarding a specific object and ArcGIS got frozen.
    >>>    >          I
    >>>    >          guess it’s because the PC I used doesn’t have enough
    >>>    >          capacity.
    >>>    >          Unfortunately, I don’t have access to a more powerful one.
    >>>    >
    >>>    >          Anyway, I decided to try with R -after all, I’ll perform my
    >>>    >          analysis with it. So I started exploring this GDB with
    >>>    >          rgdal:
    >>>    >
    >>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >>>    >
    >>>    >          Source:
    >>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
    >>>    >          layer: "clc12_Version_18_5"
    >>>    >          Driver: OpenFileGDB;
    >>>    >          number of rows: 2370829
    >>>    >          Feature type: wkbPolygon with 3 dimensions
    >>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
    >>>    >          Null geometry IDs: 2156240
    >>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
    >>>    >          +y_0=3210000
    >>>    >          +ellps=GRS80 +units=m +no_defs
    >>>    >          Number of fields: 6
    >>>    >          name                      type  length  typeName
    >>>    >          1 code_12              4       3           String
    >>>    >          2 ID                         4       18         String
    >>>    >          3 Remark               4       20         String
    >>>    >          4 Area_Ha             2       0           Real
    >>>    >          5 Shape_Length   2       0           Real
    >>>    >          6 Shape_Area       2       0           Real
    >>>    >
    >>>    >          Then I tried to load it typing
    >>>    >
    >>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >>>    >
    >>>    >          After 3-5 minutes, it appears the following text:
    >>>    >
    >>>    >          OGR data source with driver: OpenFileGDB
    >>>    >          Source:
    >>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
    >>>    >          layer:
    >>>    >          "clc12_Version_18_5"
    >>>    >          with 2370829 features
    >>>    >          It has 6 fields
    >>>    >
    >>>    >          Unfortunately, after trying 5 times (each one took around 8
    >>>    >          hours) I couldn’t get anything but the following messages:
    >>>    >
    >>>    >          Warning messages:
    >>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
    >>>    >          "clc12_Version_18_5") :
    >>>    >          Dropping null geometries: 2156240
    >>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
    >>>    >          "clc12_Version_18_5") :
    >>>    >          Z-dimension discarded
    >>>    >
    >>>    >          Could anyone advise me how to tackle it? I also would
    >>>    >          appreciate suggestions on how to work with geodatabases -
    >>>    >          it’s
    >>>    >          my first time I work with these kind of files so I don’t
    >>>    >          even
    >>>    >          know their structure.
    >>>    >
    >>>    >          By the way, these are my computer and software
    >>>    >          specifications:
    >>>    >
    >>>    >          MacBook Pro 2012
    >>>    >          Processor 2.6 GHz Intel Core i7
    >>>    >          Memory 16 GB DDR3
    >>>    >          R 3.5.0
    >>>    >          RStudio 1.1.447
    >>>    >
    >>>    >
    >>>    >          Best,
    >>>    >          Roman.
    >>>    >
    >>>    >
    >>>    >
    >>>    >
    >>>    >   _______________________________________________
    >>>    >   R-sig-Geo mailing list
    >>>    >   R-sig-Geo at r-project.org
    >>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
    >>>
    >>>      --
    >>>      Roger Bivand
    >>>      Department of Economics, Norwegian School of Economics,
    >>>      Helleveien 30, N-5045 Bergen, Norway.
    >>>      voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
    >>>      http://orcid.org/0000-0003-2392-6140
    >>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
    >>>
    >>> 
    >> 
    >> 
    >
    >
    
    -- 
    Roger Bivand
    Department of Economics, Norwegian School of Economics,
    Helleveien 30, N-5045 Bergen, Norway.
    voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
    http://orcid.org/0000-0003-2392-6140
    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list