[R-sig-Geo] rgdal writeOGR - clipping variable names

James Rooney ROONEYJ4 at tcd.ie
Mon Jul 7 13:35:29 CEST 2014


Haha thanks very much for the comparison.

Hmm it is something I shall have to think about. There was some reason I decided to use shapefiles that escapes my memory. Apart from that at I'm kind of invested now - if I change it could be a nightmare to trace through the workflow.
However - I take your thoughts on board and will think about it if a natural breakpoint in the work to make such a change presents itself!

Thanks again,

James
________________________________________
From: b.rowlingson at gmail.com [b.rowlingson at gmail.com] On Behalf Of Barry Rowlingson [b.rowlingson at lancaster.ac.uk]
Sent: 07 July 2014 09:44
To: James Rooney
Cc: R-sig-Geo at r-project.org
Subject: Re: [R-sig-Geo] rgdal writeOGR - clipping variable names

"Ten characters in field names should be enough for everyone" - Jack
Dangermond (ESRI founder and billionaire)

Okay, he didn't really say that (just as Bill Gates didn't say
something similar about 640K of RAM) but at some point someone decided
it was enough for DBF files.

So, you might want to think about not using shapefiles...

Alternatives include GML (which most modern GISs will read and write
but can be verbose and I've heard people swearing at it almost as much
as shapefiles) or SpatiaLite databases...

If your rgdal package has SQLite drivers then you can use SpatiaLite
databases now. eg:

writeOGR(scot, "scotland.sqlite","scot",driver = "SQLite",
dataset_options=c("SPATIALITE=yes"))

[note some options may have changed, I'm using a slightly old system here]

this creates a *single* file (win #1 over shapefiles) called
scotland.sqlite with a map layer in it, and preserves long field names
(win #2 over shapefiles). The SQLite database is an open standard
(that's a tie with shapefiles) but is being actively developed into a
wider geodatabase package standard (win #3 over shapefiles).

you can also store non-spatial data in it, since its a SQLite database
file (win #4 over shapefiles) and so send someone a whole bunch of
spatial and non-spatial data in a single file (win #5).

Spatialite DBs can be read in to QGIS and I think they can be read
into the leading proprietary GIS package (call that a tie too).

At the moment with rgdal I can't figure out how to add several layers
to a single Spatialite file or overwrite a single existing layer but I
suspect I'm just missing an option or two. I can just delete the
.sqlite file and recreate it, but that's an annoyance... Oh well,
there has to be one doesn't there?

Barry



On Mon, Jul 7, 2014 at 8:16 AM, James Rooney <ROONEYJ4 at tcd.ie> wrote:
> Hi Roger,
>
> Many thanks for your answer - very enlightening.
> Apologies I didn't realise the sessionInfo and other stuff would help.
> As it happens your explanation makes perfect sense - it is only after I started using some longer and similar field names that the behaviour changed. Knowing how it works now thanks to your explanation I should be able to avoid that. I will make changes and see if it works. If it doens't work I'll post back with all the version info etc.
>
> Many thanks,
> James
> ________________________________________
> From: Roger Bivand [Roger.Bivand at nhh.no]
> Sent: 06 July 2014 23:12
> To: James Rooney
> Cc: R-sig-Geo at r-project.org
> Subject: Re: [R-sig-Geo] rgdal writeOGR - clipping variable names
>
> On Sun, 6 Jul 2014, James Rooney wrote:
>
>> Hi all,
>>
>> I am using writeOGR to save ESRI shapefiles after a ton of processing.
>> Typical warning message I get is as follows:
>>
>>> writeOGR(ED, "/Users/jrooney/Documents/My Projects/ALS Spatial -
>>> Advanced/Irish Data/Processed Data/", layer="ED.cluster",driver="ESRI
>>> Shapefile",overwrite=T)
>> Warning message:
>> In writeOGR(ED, "/Users/jrooney/Documents/My Projects/ALS Spatial -
>> Advanced/Irish Data/Processed Data/", :
>>  Field names abbreviated for ESRI Shapefile driver
>>
>> Now it has been doing this all along. But lately it has changed. For
>> example - I frequently use the column id "GEOGID" as I've inherited it
>> somewhere along the line from some dataset I got somewhere.
>
> It would have helped if you gave the output of sessionInfo() - including
> the version of rgdal, and the startup messages on loading rgdal, as rgdal
> version, underlying GDAL version, and platform may make a difference. In
> addition, it would help to know how you installed rgdal.
>
> The warning and field-name abbreviation occurs in R/ogr_write.R after line
> 116, and has been present since mid-November 2011. This is based on:
>
> http://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrsf_frmts/shape/drv_shapefile.html
>
> after line 120, describing what GDAL does to field names, and trying to
> pre-empt this in R. As the OGR driver only supports names <= 10 characters
> long, the R code uses up to two passes of abbreviate() to shorten field
> names in data to be exported, first trying to get to minlength=7
> chararacters, and if this is insufficient, to minlength=5. Your case
> suggests that you now have so many similar field names over 10 characters
> long that abbreviate() no longer succeeds with a wish of minlength=7, so
> goes to minlength=5 - this would explain the change in behaviour.
>
> Testing each name separately rather than using abbreviate() on the vector
> of names would incur the further cost of checking for uniqueness - the
> function preserves uniqueness when strict=FALSE.
>
> The reason for the difficulty is that the underlying DBF format cannot be
> relied on to support field names > 10 characters long, so the OGR driver,
> and writeOGR(), are obliged to protect users from arbitrary shortening,
> which could lead to multiple fields having the same name. Some
> applications may permit longer names, but others do not, and those are the
> ones that set the limit.
>
> The only reliable resolution is to give your own variable/field names that
> are all <= 10 characters long if you use the ESRI Shapefile driver for
> exporting objects. You can handle this in scripts yourself by looking at
> nchar(names(<obj>)), and abbreviating uniquely any that are longer.
>
> Note that there are also substantial encoding challenges in the DBF format
> too, although I don't think that this is affecting nchar() here - keeping
> to ASCII (always single byte) in field names with an archaic format like
> DBF may be judicious.
>
> Hope this clarifies,
>
> Roger
>
>> Until recently it was happily saving this column name in its entireity.
>> Lately, for some mystery reason it is not saving it fully and only
>> saving 5 letters - "GEOGI". This, as you can imagine is causing me
>> troubles.
>>
>> Anyhow - why are field names so short for ESRI shapfiles and why does it sometimes accept 6 letter fields and other times not ? It would be great if it would allow longer field names, as 5 letters is not alot when you have over 100 variables - it gets confusing even when you use codes.
>>
>> Many thanks,
>> James
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 91 00
> e-mail: Roger.Bivand at nhh.no
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list