[R-sig-Geo] rgdal writeOGR - clipping variable names

Roger Bivand Roger.Bivand at nhh.no
Mon Jul 7 00:12:22 CEST 2014


On Sun, 6 Jul 2014, James Rooney wrote:

> Hi all,
>
> I am using writeOGR to save ESRI shapefiles after a ton of processing. 
> Typical warning message I get is as follows:
>
>> writeOGR(ED, "/Users/jrooney/Documents/My Projects/ALS Spatial - 
>> Advanced/Irish Data/Processed Data/", layer="ED.cluster",driver="ESRI 
>> Shapefile",overwrite=T)
> Warning message:
> In writeOGR(ED, "/Users/jrooney/Documents/My Projects/ALS Spatial - 
> Advanced/Irish Data/Processed Data/", :
>  Field names abbreviated for ESRI Shapefile driver
>
> Now it has been doing this all along. But lately it has changed. For 
> example - I frequently use the column id "GEOGID" as I've inherited it 
> somewhere along the line from some dataset I got somewhere.

It would have helped if you gave the output of sessionInfo() - including 
the version of rgdal, and the startup messages on loading rgdal, as rgdal 
version, underlying GDAL version, and platform may make a difference. In 
addition, it would help to know how you installed rgdal.

The warning and field-name abbreviation occurs in R/ogr_write.R after line 
116, and has been present since mid-November 2011. This is based on:

http://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrsf_frmts/shape/drv_shapefile.html

after line 120, describing what GDAL does to field names, and trying to 
pre-empt this in R. As the OGR driver only supports names <= 10 characters 
long, the R code uses up to two passes of abbreviate() to shorten field 
names in data to be exported, first trying to get to minlength=7 
chararacters, and if this is insufficient, to minlength=5. Your case 
suggests that you now have so many similar field names over 10 characters 
long that abbreviate() no longer succeeds with a wish of minlength=7, so 
goes to minlength=5 - this would explain the change in behaviour.

Testing each name separately rather than using abbreviate() on the vector 
of names would incur the further cost of checking for uniqueness - the 
function preserves uniqueness when strict=FALSE.

The reason for the difficulty is that the underlying DBF format cannot be 
relied on to support field names > 10 characters long, so the OGR driver, 
and writeOGR(), are obliged to protect users from arbitrary shortening, 
which could lead to multiple fields having the same name. Some 
applications may permit longer names, but others do not, and those are the 
ones that set the limit.

The only reliable resolution is to give your own variable/field names that 
are all <= 10 characters long if you use the ESRI Shapefile driver for 
exporting objects. You can handle this in scripts yourself by looking at 
nchar(names(<obj>)), and abbreviating uniquely any that are longer.

Note that there are also substantial encoding challenges in the DBF format 
too, although I don't think that this is affecting nchar() here - keeping 
to ASCII (always single byte) in field names with an archaic format like 
DBF may be judicious.

Hope this clarifies,

Roger

> Until recently it was happily saving this column name in its entireity. 
> Lately, for some mystery reason it is not saving it fully and only 
> saving 5 letters - "GEOGI". This, as you can imagine is causing me 
> troubles.
>
> Anyhow - why are field names so short for ESRI shapfiles and why does it sometimes accept 6 letter fields and other times not ? It would be great if it would allow longer field names, as 5 letters is not alot when you have over 100 variables - it gets confusing even when you use codes.
>
> Many thanks,
> James
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list