[R-sig-Geo] Convert geojson file to R

Barry Rowlingson b@row||ng@on @end|ng |rom gm@||@com
Tue Nov 29 12:56:57 CET 2022


Ah ha, its a file with no line endings. Count the lines, zero:

$ wc -l countrymasks.geojson
0 countrymasks.geojson

Although adding a line ending to the file still produces the error.
Hmm. Even running it through `json_pp` to get it on multiple lines
results in the error:


$ more countrymasks_pp.geojson
{
   "features" : [
      {
         "geometry" : {
            "coordinates" : [
...


> cm = sf::st_read("./countrymasks_pp.geojson")
Reading layer `countrymasks_pp' from data source
  `/nobackup/rowlings/Downloads/SO/countrymasks_pp.geojson' using
driver `GeoJSON'
Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  attempt to set index 210/210 in SET_STRING_ELT

Weirdly weird, `ogrinfo` thinks there's 210 features, but `st_read`
with my "query" fix gets 214...


$ ogrinfo -so -al countrymasks_pp.geojson
INFO: Open of `countrymasks_pp.geojson'
      using driver `GeoJSON' successful.

Layer name: countrymasks_pp
Geometry: Unknown (any)
Feature Count: 210   <--- 210 features


> cm = sf::st_read("./countrymasks_pp.geojson", query="select * from countrymasks_pp where 1 = 1")
Reading query `select * from countrymasks_pp where 1 = 1'
from data source
`/nobackup/rowlings/Downloads/SO/countrymasks_pp.geojson' using driver
`GeoJSON'
Simple feature collection with 214 features and 15 fields
Geometry type: MULTIPOLYGON


I now notice some of the rows have incomplete property sets, for
example Paraguay has:

        "properties" : {
            "ISIPEDIA" : "PRY",
            "NAME" : "Paraguay",
            "an_crop" : "t",
            "an_range" : "t",
            "asap0_id" : 178,
            "asap_cntry" : "f",
            "g1_units" : 17,
            "isocode" : "PY",
            "km2_crop" : 80032,
            "km2_rang2" : 20908,
            "km2_tot" : 399367,
            "name0" : "Paraguay",
            "name0_shr" : "Paraguay"
         },

but Palestine has:

         "properties" : {
            "ISIPEDIA" : "PSE",
            "NAME" : "Palestine, State of",
            "isocode" : "PS",
            "km2_crop" : 84,
            "km2_rang2" : 3423,
            "km2_tot" : 6224,
            "name0" : "Palestine, State of"
         }

Then there "Caribbean island small states" which has a vector in its properties:

       "properties" : {
            "ISIPEDIA" : "CSID",
            "NAME" : "Caribbean island small states",
            "country_codes" : [
               "BRB",
               "DMA",
               "VIR",
               "BHS",
               "GRD",
               "ATG",
               "ANT",
               "LCA",
               "BLZ",
               "CYM",
               "VCT"
            ]

I wonder if these irregularities are causing odd problems with the
various gdal ways of parsing this...

Ugh. Let's all use geopackages....

B

On Tue, Nov 29, 2022 at 11:34 AM Edzer Pebesma
<edzer.pebesma using uni-muenster.de> wrote:
>
> Interestingly, what seems to works is
>
> readLines('countrymasks.geojson') |> st_read() -> r
>
> with a warning:
>
> Warning message:
> In readLines("countrymasks.geojson") :
>    incomplete final line found on 'countrymasks.geojson'
>
>
> On 29/11/2022 00:58, Miluji Sb wrote:
> > Thank you. I will report this bug (I did not have the confidence to call
> > this a bug before).
> >
> > Even using your code, I get the same output.
> >
> > structure(list(X = c(-67.3804401, -67.36091, -67.3805899999999,
> > -67.3397099999998, -67.3780199, -67.3221199999999), Y = c(-55.5655699999996,
> > -55.5840098999999, -55.6004100000001, -55.6149699999997, -55.63521,
> > -55.6400899999997), L1 = c(1, 1, 1, 1, 1, 1), L2 = c(1, 1, 1,
> > 1, 1, 1), L3 = c(1, 1, 1, 1, 1, 1)), row.names = c(NA, 6L), class =
> > "data.frame")
> >
> > Thank you again.
> >
> > On Mon, Nov 28, 2022 at 11:13 PM Barry Rowlingson <b.rowlingson using gmail.com>
> > wrote:
> >
> >> This seems to be a weird bug in `st_read`. If you read it with an SQL
> >> query that matches every row it works:
> >>
> >>> js = st_read("./countrymasks.geojson", query="select * from countrymasks
> >> where 1 = 1")
> >> Reading query `select * from countrymasks where 1 = 1' from data
> >> source `/home/rowlings/Downloads/countrymasks.geojson' using driver
> >> `GeoJSON'
> >> Simple feature collection with 214 features and 15 fields
> >> Geometry type: MULTIPOLYGON
> >> Dimension:     XY
> >> Bounding box:  xmin: -180 ymin: -55.79439 xmax: 180 ymax: 83.62742
> >> Geodetic CRS:  WGS 84
> >>
> >> But leave out the query and you get that C code level error. Another
> >> equivalent query would be "select * from countrymasks" (without the
> >> "where" clause) but this
> >> triggers the error too. Very odd. Worth reporting as a bug?
> >>
> >> Barry
> >>
> >>
> >>
> >>
> >> On Mon, Nov 28, 2022 at 8:08 PM Miluji Sb <milujisb using gmail.com> wrote:
> >>>
> >>> Thank you for reply. When I try using sf, I get the following error;
> >>>
> >>> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
> >>>    attempt to set index 210/210 in SET_STRING_ELT.
> >>>
> >>> Thanks again!
> >>>
> >>> On Mon, Nov 28, 2022 at 1:50 PM Josiah Parry <josiah.parry using gmail.com>
> >> wrote:
> >>>
> >>>> You're going to want to read the file with sf.
> >>>>
> >>>> Try object <- sf::st_read("~countrymasks.geojson")
> >>>>
> >>>> On Mon, Nov 28, 2022 at 7:09 AM Miluji Sb <milujisb using gmail.com> wrote:
> >>>>
> >>>>> Greetings everyone,
> >>>>>
> >>>>> I would like to convert the geojson file (
> >>>>>
> >>>>>
> >> https://drive.google.com/file/d/18h3sOjZg5jp5euLTWRi5mC40Sja8TZDN/view?usp=sharing
> >>>>> )
> >>>>> to a dataframe - essentially obtain which has coordinates matched to a
> >>>>> country.
> >>>>>
> >>>>> I have tried the following;
> >>>>>
> >>>>> ###
> >>>>> states <- geojsonsf::geojson_sf("~/countrymasks.geojson")
> >>>>> geo <- geojsonsf::sf_geojson(states)
> >>>>> sf <- sf::st_read(geo, quiet = T )
> >>>>> df <- as.data.frame(sf::st_coordinates(sf) )
> >>>>> ##
> >>>>>
> >>>>> But I get the following output, I am a bit lost. Any help will be
> >> highly
> >>>>> appreciated.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>>    structure(list(X = c(-67.3804401, -67.36091, -67.3805899999999,
> >>>>> -67.3397099999998, -67.3780199, -67.3221199999999), Y =
> >>>>> c(-55.5655699999996,
> >>>>> -55.5840098999999, -55.6004100000001, -55.6149699999997, -55.63521,
> >>>>> -55.6400899999997), L1 = c(1, 1, 1, 1, 1, 1), L2 = c(1, 1, 1,
> >>>>> 1, 1, 1), L3 = c(1, 1, 1, 1, 1, 1)), row.names = c(NA, 6L), class =
> >>>>> "data.frame")
> >>>>>
> >>>>>          [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-Geo mailing list
> >>>>> R-sig-Geo using r-project.org
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>>
> >>>>
> >>>
> >>>          [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> R-sig-Geo using r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> --
> Edzer Pebesma
> Institute for Geoinformatics
> Heisenbergstrasse 2, 48151 Muenster, Germany
> Phone: +49 251 8333081
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list