[R-sig-Geo] writing shapefiles / DBF files when input data contains NA

Dylan Beaudette dylan.beaudette at gmail.com
Wed Oct 8 23:05:58 CEST 2008


On Wednesday 08 October 2008, Roger Bivand wrote:
> On Wed, 8 Oct 2008, Dylan Beaudette wrote:
> > On Tuesday 07 October 2008, Dylan Beaudette wrote:
> >> On Tuesday 07 October 2008, Roger Bivand wrote:
> >>> On Mon, 6 Oct 2008, Dylan Beaudette wrote:
> >>>> Hi,
> >>>>
> >>>> I have noticed that saving data to files that include a DBF, result in
> >>>> bogus data where there were NA. Using the write.dbf() function from
> >>>> the foreign package seems to work a little better, but I still get odd
> >>>> results in numeric columns. Writing to GRASS with the methods in the
> >>>> spgrass6 package results in some thing that looks like this:
> >>>
> >>> Dylan,
> >>>
> >>> I'm afraid that there is no good solution for this at all. DBF does not
> >>> seem to have a clear and uniform NA treatment (or even !is.finite()
> >>> treatment). The only work-around is to preprocess the data.frame in the
> >>> output object to insert known NODATA values, and to replace those flags
> >>> manually on the GRASS side. This could possibly be written as a wrapper
> >>> around writeVECT6(). The help page does say:
> >>>
> >>>      "Please note that the OGR drivers used may not handle missing data
> >>>       gracefully, and be prepared to have to correct for this manually.
> >>>       For example use of the 'readOGR' PostGIS driver directly may
> >>>       perform better than moving the data through the DBF driver used
> >>> in this function - or a PostgreSQL driver used directly or through ODBC
> >>> may be a solution. Do not rely on missing values of vector data moving
> >>> smoothly across the interface."
> >>>
> >>> I did try to look at the SQLite driver on the GRASS side, which might
> >>> be more robust, but did not see how to proceed.
> >>>
> >>> One possibility is not to recode, but to build an NA mask on the R
> >>> side, and then loop over fields on the GRASS side for the chosen driver
> >>> inserting NAs in the correct rows (whatever the syntax for that might
> >>> be). Would this be db.execute with an insertion of SQL NULL?
> >>>
> >>> Can we redirect this discussion to the statgrass list, because GRASS
> >>> developers follow that list?
> >>>
> >>> Best wishes,
> >>>
> >>> Roger
> >>
> >> Sorry for the cross-posting. Wanted to clarify where this thread is
> >> going/went.
> >>
> >> Hi Roger--
> >>
> >> It looks like the limiting factor in this equation is the code used in
> >> v.out.ogr.
> >>
> >>> From the GRASS-dev + Frank W's help:
> >>>> Sounds good :)
> >>>> Does anyone know how to fix
> >>>>  vector/v.out.ogr/main.c
> >>>> to support NULLs? I see db_set_value_null() in
> >>>>  lib/db/dbmi_base/value.c
> >>>> which might be relevant.
> >>>
> >>> Markus,
> >>>
> >>> Once you establish which GRASS attributes are NULL, you can ensure they
> >>> are pushed out to OGR as null by just skipping the step that sets them.
> >>> Perhaps that will help a bit.
> >>
> >> So, once v.out.ogr is fixed, this should clear up several issues:
> >>
> >> 1. import of vector data into R via spgrass6 methods
> >> 2. better compatibility of vector data exported from GRASS
> >>
> >> I still do not know why writeOGR() does not create correct DBF files...
> >> it may be related to the code in v.out.ogr....
> >>
> >> Cheers,
> >>
> >> Dylan
> >
> > Some follow-up: the incorrect handling of NULL values appears to be
> > related to the current implementation of v.out.ogr AND readOGR() /
> > writeOGR().
>
> OK, this makes sense, because parts of readOGR() / writeOGR() were written
> based on the logic of v.in.ogr and v.out.ogr, and more attention was given
> to the geometries than the attribute fields. If the GRASS code was taking
> liberties with handling NAs, then that behaviour is very probably present
> in readOGR() / writeOGR() too.

If that is the case, then a fix for one should easily be 'ported'  to the 
other. A place to start looking for an answer would probably be the source 
for v.in.ogr -- as this correctly preserves NULL data when importing from 
shapefiles... haven't tried anything else.

> The rgdal package has a public sourceforge CVS repository, so everybody
> please feel free to browse for bugs. It would be helpful to have a set of
> vector files with valid NAs (not just shapefiles), and a set of sp objects
> with NAs, and to be able to move them in and out of both R and GRASS (and
> other software) with the NAs intact.

Attached to this message is one such shapefile-- sorry I do not have another 
vector format.

> As a first bite, OGRFeature::IsFieldSet() seems to test whether the field
> is set or not. It isn't used in ogrReadColumn() in src/ogrsource.cpp in
> rgdal, nor the equivalent in OGR_write() in src/OGR_write.cpp.
>
> Assuming that we can correct these to use OGR NULL data representations
> (would that be unset the field for the feature?), we then depend on the
> drivers using the same logic. In addition, non-OGR written files need to
> use the same understanding of NULL as the OGR drivers. GRASS v.in.ogr()
> does use OGR_F_IsFieldSet(), and if not set writes a NULL to numeric
> fields and an empty string to the others. Fixing writeOGR() ought to get
> NAs from R to GRASS. v.out.ogr does not seem to use OGR_F_UnsetField() on
> the fields being output, and readOGR() does not test for the fields being
> unset either - so getting NAs from GRASS to R needs more work.
>
> This is described in extenso here because things don't happen by
> themselves, and this particular overlap of R/OGR/GRASS code probably
> matters to regular users of rgdal. Collaboration in fixing the handling of
> NAs in vector data files invited!
>
> Roger
>

Maybe this bit from M. Neteler / Frank W. will help:

Markus:
>  vector/v.out.ogr/main.c
> to support NULLs? I see db_set_value_null() in
>  lib/db/dbmi_base/value.c
> which might be relevant.

Frank:
> Once you establish which GRASS attributes are NULL, you can ensure they
> are pushed out to OGR as null by just skipping the step that sets them.
> Perhaps that will help a bit.

Cheers,

Dylan

-------------- next part --------------
PROJCS["NAD_1927_UTM_Zone_13N",GEOGCS["GCS_North_American_1927",DATUM["D_North_American_1927",SPHEROID["Clarke_1866",6378206.4,294.978698213898]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1]]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a_temp2.shx
Type: application/octet-stream
Size: 300 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20081008/290af92e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a_temp2.dbf
Type: application/x-dbase
Size: 1518 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20081008/290af92e/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a_temp2.shp
Type: application/octet-stream
Size: 800 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20081008/290af92e/attachment-0001.obj>


More information about the R-sig-Geo mailing list