[R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Roger Bivand Roger.Bivand at nhh.no
Tue Oct 7 09:10:50 CEST 2008


On Mon, 6 Oct 2008, Dylan Beaudette wrote:

> Hi,
>
> I have noticed that saving data to files that include a DBF, result in
> bogus data where there were NA. Using the write.dbf() function from
> the foreign package seems to work a little better, but I still get odd
> results in numeric columns. Writing to GRASS with the methods in the
> spgrass6 package results in some thing that looks like this:
>

Dylan,

I'm afraid that there is no good solution for this at all. DBF does not 
seem to have a clear and uniform NA treatment (or even !is.finite() 
treatment). The only work-around is to preprocess the data.frame in the 
output object to insert known NODATA values, and to replace those flags 
manually on the GRASS side. This could possibly be written as a wrapper 
around writeVECT6(). The help page does say:

     "Please note that the OGR drivers used may not handle missing data
      gracefully, and be prepared to have to correct for this manually.
      For example use of the 'readOGR' PostGIS driver directly may
      perform better than moving the data through the DBF driver used in
      this function - or a PostgreSQL driver used directly or through
      ODBC may be a solution. Do not rely on missing values of vector
      data moving smoothly across the interface."

I did try to look at the SQLite driver on the GRASS side, which might be 
more robust, but did not see how to proceed.

One possibility is not to recode, but to build an NA mask on the R side, 
and then loop over fields on the GRASS side for the chosen driver 
inserting NAs in the correct rows (whatever the syntax for that might be). 
Would this be db.execute with an insertion of SQL NULL?

Can we redirect this discussion to the statgrass list, because GRASS 
developers follow that list?

Best wishes,

Roger

> ### code snippet:
> writeVECT6(SDF=spatial.data, vname='pedons_grouped')
>
>
> ### errors:
> Projection of input dataset and current location appear to match
> Layer: pedons_g
> WARNING: Column name changed: 'describer.' -> 'describer_'
> WARNING: Column name changed: 'cat' -> 'cat_'
> Importing map 103 features...
> DBMI-DBF driver error:
> SQL parser error: @@rror, unexpected NAME processing 'nan'
> in statement:
> insert into pedons_grouped values ( 1, 'd2g1', 'alex',
> 32.311427999999999,      252.434875000000005,     7227.804688000000169,
> -0.000162000000000,           3,                      nan, 'NA',
> -2147483648, 'NA', 'NA', -2147483648, -2147483648, 'NA',
> nan, '1', 'NA' )
> Error in db_execute_immediate()
>
> ERROR: Cannot insert new row: insert into pedons_grouped values ( 1,
>       'd2g1', 'alex', 32.311427999999999, 252.434875000000005,
>       7227.804688000000169, -0.000162000000000, 3, nan, 'NA',
> -2147483648,
>       'NA', 'NA', -2147483648, -2147483648, 'NA', nan, '1', 'NA' )
>
>
> ### another self-contained example:
>
>
> # load libs
> library(sp)
> library(rgdal)
> library(foreign)
>
> # read in xy data and promote to sp object
> e <- read.csv(url('http://casoilresource.lawr.ucdavis.edu/drupal/files/elev.csv_.txt'))
> coordinates(e) <- ~ x+y
>
> # add a factor column
> e at data$f <- factor(rep(letters[1:10], each=30))
>
> # add some NA
> e at data$elev[288:300] <- NA
> e at data$f[288:300] <- NA
>
> # save sp object to shapefile
> writeOGR(e, driver='ESRI Shapefile', dsn='.', layer='pts')
>
>
> # the results from dumping the DBF:
> [...]
> 285,1543,j
> 286,1518,j
> 287,1656,j
> 288,-2147483648,NA
> 289,-2147483648,NA
> [...]
>
>
> # one more try with the foreign package's write.dbf()
> write.dbf(e at data, file='second_try.dbf')
>
> # results: look better, although the '******' isn't a legal int!
> [...]
> 285,1543,j
> 286,1518,j
> 287,1656,j
> 288,*******,
> 289,*******,
> [...]
>
>
> Any ideas on how to work with missing data in numeric columns, when
> the dreaded DBF file is involved??? This is a real show-stopper when
> sending vector data back to GRASS, as it seems to rely on intermediate
> files. Maybe it would be a good idea to send  the geometry first, and
> then the attribute data. There would still be a problem if the DBF
> back-end is in use...
>
> Cheers,
>
> Dylan
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list