[R-sig-Geo] Issues with a GDB file in R?
Roger Bivand
Roger@Biv@nd @ending from nhh@no
Sat May 5 13:02:45 CEST 2018
On Fri, 4 May 2018, Roger Bivand wrote:
> On Thu, 3 May 2018, Roger Bivand wrote:
>
>> On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>
>>> Hi again,
>>>
>>> first of all, thanks a lot for your comments. It's becoming quite
>>> interesting how to perform this task with such amount of data.
>>>
>>>
>>> Here is an update...
>>>
>>> Cotton, I could read the geodatabase by using the commands I shared.
>>> However, my R session crashed after overlaying it with a small set of
>>> polygons. I guess it was because the size of the sp object (around 14.5
>>> GB). It's also worth mentioning that it was just a "multipolygon" (is
>>> that
>>> the correct word?) formed by 2370829 polygons. I haven’t succeeded on
>>> installing the sf package, but I will keep trying it.
>>>
>>> Melanie, I already downloaded and read the geoTiff version. At first
>>> sight, it seems that this object doesn’t have enough features as to
>>> perform the overlying. I might have a biased idea of how raster objects
>>> work - I'm too stick to the representation of shapefiles which sounds
>>> quite related with Roger's idea. So I will start to explore it in order
>>> to
>>> gain a bit more understanding on it.
>>>
>>> Roger, I certainly need to know which category is associated with each
>>> CLC
>>> polygon in order to compute the area covered by each of these classes
>>> within each NUTS3 region. Therefore, I still need to use a vector-vector
>>> overlay, right?
>>
>> On the contrary. The geoTiffs are coded as described in the documentation.
>> Your output is simply the count of raster cells by NUTS3 for each of the
>> categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
>> a lot of sea and no data because of the French overseas territories.
>>
>> You could possibly use PostGIS to do the intersections, or use GRASS for
>> raster-vector counts (say through rgrass7). In R, you would want to add a
>> NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
>>
>> I would suggest using GRASS as the most obvious route, reading the raster,
>> reading the NUTS3 boundaries into a separate location, projecting to LAEA,
>> then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
>> get 32K output categories, the 48 corine categories times the count of
>> NUTS3 regions minus the nulls (there aren't many glaciers in most
>> regions). You'll then need to match back the Corine codes and the NUTS3
>> codes - see in the category label file shown by r.category.
>>
>> I'll try to provide code tomorrow.
>
> Combining R and GRASS seems to work, but no guarantees.
>
> library(sp)
> library(rgdal)
> Gi <- GDALinfo("g250_clc12_V18_5.tif")
> makeSG <- function(x) {
> stopifnot(class(x) == "GDALobj")
> p4 <- attr(x, "projection")
> gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
> c(x[2], x[1]))
> SpatialGrid(gt, CRS(p4))
> }
> SG <- makeSG(Gi)
> nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
> nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
> library(rgrass7)
> td <- tempdir()
> iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
> # your GRASS will be where you installed it
> writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
> execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
> flags="o")
> execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
> label_column="FID")
> execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
> r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
> intern=TRUE)
> r_stats1 <- gsub("\\*", "NA", r_stats0)
> con_stats <- textConnection(r_stats1)
> stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
> "area"), colClasses=c("integer", "numeric"))
> close(con_stats)
> r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
> r_cats1 <- gsub(";", "", r_cats0)
> r_cats2 <- gsub("\t", " ", r_cats1)
> r_cats3 <- gsub("no data", "no_data", r_cats2)
> r_cats4 <- gsub("category ", "", r_cats3)
> r_cats4[1] <- paste0(r_cats4[1], "NA NA")
> r_cats_split <- strsplit(r_cats4, " ")
> cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
> nuts=sapply(r_cats_split, "[", 2),
> corine=as.integer(sapply(r_cats_split, "[", 3)))
> catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
> agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
> sum)
> library(foreign)
> corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
> o <- match(colnames(agg_areas), as.character(corine_labels$Value))
> colnames(agg_areas) <- corine_labels$LABEL3[o]
> agg_areas_df <- as.data.frame(agg_areas)
> agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
> as.character(nuts_ll$FID))),] # dropping "NA" "no_data"
>
> This should be ready to merge with the NUTS3 boundaries, if needed.
>
> agg_areas_df1$FID <- row.names(agg_areas_df1)
> nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
>
And to write out as GPKG:
names(nuts_corine)[1] <- "FID_"
writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")
It seems that at least this driver treats "FID" as a reserved field name.
> For the vector parts you could use sf and the provisional rgrass7sf on
> github, but that wouldn't yet let you construct a skeleton SpatialGrid to
> define the GRASS location. Using GRASS for the heavy lifting (the raster is
> 51000 by 35000), and avoiding vector for overlay, this doesn't need much
> memory (GRASS handles rasters by row). The GRASS temporary location only
> takes 130MB of disk space. You could go for the 100m raster resolution, but I
> doubt that the outcome would vary much - anyone like to try?
>
> If the sub-polygons by NUTS and corine categories are actually needed, the
> output of r.cross could be passed to r.to.vect:
>
> execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
> type="area")
>
> but this is more demanding in memory terms.
Reading into R is not possible, object too large.
Roger
>
> Interesting case, and it does show that combining GIS and R delivers the
> goods - SAGA would probably work equivalently.
>
> Roger
>
>>
>> Roger
>>
>>>
>>> I really appreciate any feedback in advance as well as details that I
>>> should take into account to understand more about how to work with this
>>> kind of data. I will also keep you up to date on how it goes if you
>>> like.
>>>
>>>
>>> Best,
>>> Roman.
>>>
>>> On 03/05/2018, 12:26, "Roger Bivand" <Roger.Bivand at nhh.no> wrote:
>>>
>>> On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>>
>>> > Hello Shaun,
>>> >
>>> > Thanks a lot for replying and providing me alternative options.
>>> >
>>> >
>>> > Unfortunately, I can't try both options anymore as I ran out of
>>> > ArcGIS
>>> > due to I was using a university PC which is not available now (I
>>> > installed an ArcGIS trial version there), but I'll try it later. I
>>> > also
>>> > failed on installing the sf package - I'll dig a bit more on it.
>>> > Nevertheless, I already downloaded the SpatialLite version so I'll
>>> > try
>>> > the second option and I'll let you know how it goes.
>>>
>>> I think Melanie got it right - the apparent extra detail and
>>> precision
>>> you'd get from vector-vector overlay is illusory, so going with
>>> geoTiff
>>> should get you there (you can check to see whether 100m resolution
>>> differs
>>> from 250m). The only reason to choose vector-vector would be that the
>>> Corine vector contains categories not represented in the raster
>>> version.
>>> Using raster also steps around the polygon deficiencies in the GDB
>>> (and
>>> probably SQLite) representations.
>>>
>>> Roger
>>>
>>> >
>>> >
>>> > Regards,
>>> > Roman.
>>> >
>>> > On 02/05/2018, 18:53, "Shaun Walbridge" <SWalbridge at esri.com>
>>> > wrote:
>>> >
>>> > Hello Roman,
>>> >
>>> > A couple of suggested options: You can try and use the
>>> > arcgisbinding package [1] to pull the data directly into R via
>>> > ArcGIS, as you mentioned you have ArcGIS available [presumably
>>> > on
>>> > Windows or in a VM]. It will access the data directly, and let
>>> > you
>>> > create sp and sf objects out of Geodatabases with little work. A
>>> > basic workflow looks like this:
>>> >
>>> > d <- arc.open("path/file.gdb/layer_name")
>>> > df <- arc.select(d) # here, you can filter columns and
>>> > attributes,
>>> > see [2]
>>> > # create an sf object
>>> > df.sf <- arc.data2sf(df)
>>> > # alternatively, create an sp object
>>> > df.sp <- arc.data2sp(df)
>>> >
>>> > You can also write the results back to a geodatabase, or any
>>> > other
>>> > format that ArcGIS understands. Another option is trying the
>>> > SpatiaLite version of the data at the URL you posted. You should
>>> > be able to access this using R directly, provided your rgdal
>>> > installation is correctly built to read spatialite databases. If
>>> > it is, try the same process you mentioned using rgdal, but point
>>> > it at the SpatialLite database instead. You could also use
>>> > command-line OGR to convert the data, that has a few options
>>> > like
>>> > where clause filtering that aren't directly available via
>>> > readOGR.
>>> >
>>> > If neither of these options work, let me know and I can convert
>>> > the data for you into a format of your preference.
>>> >
>>> > Cheers,
>>> > Shaun
>>> >
>>> > 1.
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>>> > 2.
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>>> >
>>> >
>>> >
>>> > On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
>>> > Roman"
>>> > <r-sig-geo-bounces at r-project.org on behalf of
>>> > ra454 at exeter.ac.uk>
>>> > wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > I’ve been struggling with a ESRI Geodatabase file
>>> > (clc12_Version_18_5.gdb) which is a layer of land cover
>>> > classes available on
>>> >
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>>> >
>>> > whose size is 2.61 GB once unzipped.
>>> >
>>> > My ultimate task is to overlay it with the last NUTS3
>>> > administrative boundaries shapefile (2013) available on
>>> >
>>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>>> >
>>> > in order to compute the area covered by each class within
>>> > each
>>> > NUTS3 region.
>>> >
>>> > Despite the ease of friendly software for performing this
>>> > task, haven’t been capable of doing it - GRASS didn’t load
>>> > the
>>> > file as the log reports problems with the polygons, QGIS
>>> > shows
>>> > a warning regarding a specific object and ArcGIS got frozen.
>>> > I
>>> > guess it’s because the PC I used doesn’t have enough
>>> > capacity.
>>> > Unfortunately, I don’t have access to a more powerful one.
>>> >
>>> > Anyway, I decided to try with R -after all, I’ll perform my
>>> > analysis with it. So I started exploring this GDB with
>>> > rgdal:
>>> >
>>> > ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>> >
>>> > Source:
>>> > "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>>> > layer: "clc12_Version_18_5"
>>> > Driver: OpenFileGDB;
>>> > number of rows: 2370829
>>> > Feature type: wkbPolygon with 3 dimensions
>>> > Extent: (-2693292 -3086662) - (10037210 5440568)
>>> > Null geometry IDs: 2156240
>>> > CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
>>> > +y_0=3210000
>>> > +ellps=GRS80 +units=m +no_defs
>>> > Number of fields: 6
>>> > name type length typeName
>>> > 1 code_12 4 3 String
>>> > 2 ID 4 18 String
>>> > 3 Remark 4 20 String
>>> > 4 Area_Ha 2 0 Real
>>> > 5 Shape_Length 2 0 Real
>>> > 6 Shape_Area 2 0 Real
>>> >
>>> > Then I tried to load it typing
>>> >
>>> > clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>> >
>>> > After 3-5 minutes, it appears the following text:
>>> >
>>> > OGR data source with driver: OpenFileGDB
>>> > Source:
>>> > "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
>>> > layer:
>>> > "clc12_Version_18_5"
>>> > with 2370829 features
>>> > It has 6 fields
>>> >
>>> > Unfortunately, after trying 5 times (each one took around 8
>>> > hours) I couldn’t get anything but the following messages:
>>> >
>>> > Warning messages:
>>> > 1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>> > "clc12_Version_18_5") :
>>> > Dropping null geometries: 2156240
>>> > 2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>> > "clc12_Version_18_5") :
>>> > Z-dimension discarded
>>> >
>>> > Could anyone advise me how to tackle it? I also would
>>> > appreciate suggestions on how to work with geodatabases -
>>> > it’s
>>> > my first time I work with these kind of files so I don’t
>>> > even
>>> > know their structure.
>>> >
>>> > By the way, these are my computer and software
>>> > specifications:
>>> >
>>> > MacBook Pro 2012
>>> > Processor 2.6 GHz Intel Core i7
>>> > Memory 16 GB DDR3
>>> > R 3.5.0
>>> > RStudio 1.1.447
>>> >
>>> >
>>> > Best,
>>> > Roman.
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > R-sig-Geo mailing list
>>> > R-sig-Geo at r-project.org
>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
>>> http://orcid.org/0000-0003-2392-6140
>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>>
>>
>>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
More information about the R-sig-Geo
mailing list