[R-sig-Geo] rgdal writeOGR - clipping variable names

James Rooney ROONEYJ4 at tcd.ie
Mon Jul 7 09:16:49 CEST 2014


Hi Roger,

Many thanks for your answer - very enlightening.
Apologies I didn't realise the sessionInfo and other stuff would help.
As it happens your explanation makes perfect sense - it is only after I started using some longer and similar field names that the behaviour changed. Knowing how it works now thanks to your explanation I should be able to avoid that. I will make changes and see if it works. If it doens't work I'll post back with all the version info etc.

Many thanks,
James
________________________________________
From: Roger Bivand [Roger.Bivand at nhh.no]
Sent: 06 July 2014 23:12
To: James Rooney
Cc: R-sig-Geo at r-project.org
Subject: Re: [R-sig-Geo] rgdal writeOGR - clipping variable names

On Sun, 6 Jul 2014, James Rooney wrote:

> Hi all,
>
> I am using writeOGR to save ESRI shapefiles after a ton of processing.
> Typical warning message I get is as follows:
>
>> writeOGR(ED, "/Users/jrooney/Documents/My Projects/ALS Spatial -
>> Advanced/Irish Data/Processed Data/", layer="ED.cluster",driver="ESRI
>> Shapefile",overwrite=T)
> Warning message:
> In writeOGR(ED, "/Users/jrooney/Documents/My Projects/ALS Spatial -
> Advanced/Irish Data/Processed Data/", :
>  Field names abbreviated for ESRI Shapefile driver
>
> Now it has been doing this all along. But lately it has changed. For
> example - I frequently use the column id "GEOGID" as I've inherited it
> somewhere along the line from some dataset I got somewhere.

It would have helped if you gave the output of sessionInfo() - including
the version of rgdal, and the startup messages on loading rgdal, as rgdal
version, underlying GDAL version, and platform may make a difference. In
addition, it would help to know how you installed rgdal.

The warning and field-name abbreviation occurs in R/ogr_write.R after line
116, and has been present since mid-November 2011. This is based on:

http://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrsf_frmts/shape/drv_shapefile.html

after line 120, describing what GDAL does to field names, and trying to
pre-empt this in R. As the OGR driver only supports names <= 10 characters
long, the R code uses up to two passes of abbreviate() to shorten field
names in data to be exported, first trying to get to minlength=7
chararacters, and if this is insufficient, to minlength=5. Your case
suggests that you now have so many similar field names over 10 characters
long that abbreviate() no longer succeeds with a wish of minlength=7, so
goes to minlength=5 - this would explain the change in behaviour.

Testing each name separately rather than using abbreviate() on the vector
of names would incur the further cost of checking for uniqueness - the
function preserves uniqueness when strict=FALSE.

The reason for the difficulty is that the underlying DBF format cannot be
relied on to support field names > 10 characters long, so the OGR driver,
and writeOGR(), are obliged to protect users from arbitrary shortening,
which could lead to multiple fields having the same name. Some
applications may permit longer names, but others do not, and those are the
ones that set the limit.

The only reliable resolution is to give your own variable/field names that
are all <= 10 characters long if you use the ESRI Shapefile driver for
exporting objects. You can handle this in scripts yourself by looking at
nchar(names(<obj>)), and abbreviating uniquely any that are longer.

Note that there are also substantial encoding challenges in the DBF format
too, although I don't think that this is affecting nchar() here - keeping
to ASCII (always single byte) in field names with an archaic format like
DBF may be judicious.

Hope this clarifies,

Roger

> Until recently it was happily saving this column name in its entireity.
> Lately, for some mystery reason it is not saving it fully and only
> saving 5 letters - "GEOGI". This, as you can imagine is causing me
> troubles.
>
> Anyhow - why are field names so short for ESRI shapfiles and why does it sometimes accept 6 letter fields and other times not ? It would be great if it would allow longer field names, as 5 letters is not alot when you have over 100 variables - it gets confusing even when you use codes.
>
> Many thanks,
> James
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list