[R-sig-Geo] readOGR workaround for Japanese UTF-8 geojson
Roger Bivand
Roger@B|v@nd @end|ng |rom nhh@no
Mon Jun 29 09:32:27 CEST 2020
On Mon, 29 Jun 2020, Alan Engel wrote:
> I am working on a project https://github.com/AlanInTsukuba/jpucd that
> involves extracting shapefiles and property data from Japanese geojson
> files. When reading with readOGR(ibarakipath1 , encoding="UTF-8",
> use_iconv=TRUE), I find that the subsets of cannot be written with
> writeOGR without losing text fields that are in Japanese text. I found
> the following workaround but wonder if there is a better way to do this.
>
Firstly, the ESRI shapefile driver should only be used for reading legacy
files with known text encodings. They use DBF files to store attribute
data, which should never now be used in new work. They have restrictions
on field name length, imprecision in storing numerical data, and big
problems in storing any text that is not ASCII (see
https://cran.r-project.org/web/packages/rgdal/vignettes/OGR_shape_encoding.pdf).
All new projects must use more modern formats, preferably GeoPackage GPKG
http://www.geopackage.org/spec/, which resolves all of the problems
mentioned.
If your project is using R on Windows, you need to be aware in addition
of
https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
that is that R on Windows is moving towards UTF-8 in order to reduce
internal and cross-platform encoding problems.
Finally, you should be starting new work using the sf workflow, not
sp/rgdal. sp/rgdal are being maintained to support their reverse
dependencies only (and especially for spatial vector data, for which sf
provides full support).
Roger
>
> Environment: RGui, Windows10
>
>
>
> # load ibaraki shapefiles, extract TX subset, write to geojson
>
> library(jpucd)
>
> shppath <- system.file("extdata",package="jpucd")
>
>
>
> ibarakipath1 <-
> paste(shppath,"JPGen2005CTgenlCY2000P08Ibaraki.geojson",sep="/")
>
>
>
> #^ JPGen2005CTgenlCY2000P08Ibaraki.geojson is a UTF-8 encoded geojson file
>
> #^ having Japanese names in property fields. To be able to
>
> #^ read these fields, they need to be converted (to switch-jis?).
>
> #^ The following command does this.
>
> #^ This can also be done by use_iconv=FALSE and setting
>
> #^ the encoding of the Japanese columns using Encoding(x) <- "UTF-8".
>
>
>
> ibaraki <- readOGR(ibarakipath1 , encoding="UTF-8", use_iconv=FALSE) ##
> use_iconv=TRUE
>
> ## loads so that the Japanese fields are readable but writeOGR doesn’t
> write them.
>
> head(ibaraki using data)
>
>
>
> #^ Apply Encoding(x) <- “UTF-8”
>
> for (name in colnames(ibaraki using data[,sapply(ibaraki @data, is.character)])){
>
> Encoding(ibaraki @data[[name]]) <- "UTF-8"}
>
>
>
> #^ Get TX subset
>
> tx2000 <- ibaraki[ibaraki using data$CITY_NAME=="つくば市"|ibaraki using data$CITY_NAME=="
> 守谷町"
>
> |ibaraki using data$CITY_NAME=="伊奈町"|ibaraki using data$CITY_NAME=="谷和原村
> ",]
>
> head(tx2000 using data)
>
>
>
> #^ Write it.
>
> dsn <- "TsukubaExpressCensusDistricts2000.geojson"
>
> writeOGR(tx2000 , dsn,layer="TsukubaExpressCensusDistricts2000" ,
> driver="GeoJSON", dataset_options = NULL,
>
> layer_options=NULL, verbose = FALSE, check_exists=NULL,
>
> overwrite_layer=FALSE, delete_dsn=FALSE, morphToESRI=NULL,
>
> encoding="UTF-8")
>
>
>
> Thank you.
>
> Alan
>
> https://alanintsukuba.github.io/
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
More information about the R-sig-Geo
mailing list