[R] Summary: Unexpected result of read.dbf

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 19 18:08:36 CEST 2005


It really isn't clear that this is correct.  The reason is correct: 
read.dbf treats numeric files with no decimals as integers, and that _is_ 
as stated on the help page.  So it is definitely not a `bug', and reading 
the help would have shown the reason for the original question.
[I in general do not reply to questions that can be answered from the help 
page.]

I believe this field has been incorrectly coded as numeric, as it seems to 
be a factor ('keycode').  In particular, 19 is not a valid field size for 
a numeric field.

If one wants to allow this, I think we have to use double for a field in 
which any value is not representable as an integer, and not just if the 
field size exceeds 9.  I have been working on implementing that.

On Fri, 19 Aug 2005, Susumu Tanimura wrote:

> Hi there,
>
> This is summary and patch for a bug in read.dbf, demonstrating in
> Message-Id: <20050818150446.697835cb.stanimura-ngs at umin.ac.jp>.
>
> After consulting Rjpwiki, a cyber-community of R user in Japan, the
> cause was found, and the patch of solution was proposed.
>
> Overflowing occurs when we use read.dbf for reading a dbf file having
> a field of longer signed integer. For example,
>
> $ dbf2txt test.dbf
> #KEYCODE
> 422010010
> 42201002101
> 42201002102
> 42201002103
> 42201002104
> 422010060
> 422010071
> 422010072
> 42201008001
> 42201008002
>
> The KEYCODE field is numeric type, 19 digits, and no decimal.  You can
> create this file with OpenOffice.org Calc, txt2dbf, and so on.  You
> also prepare a file of CSV format.
>
>> library(foreign)
> > cbind(read.csv("test.csv"),read.dbf("test.dbf"))
>        KEYCODE   KEYCODE
> 1    422010010 422010010
> 2  42201002101        NA
> 3  42201002102        NA
> 4  42201002103        NA
> 5  42201002104        NA
> 6    422010060 422010060
> 7    422010071 422010071
> 8    422010072 422010072
> 9  42201008001        NA
> 10 42201008002        NA
>
> This is not reproducible when the field has decimals like numeric
> type, 19 digits, and 5 decimals.
>
> The patch written of Mr. Eiji Nakama is followed.
>
> --- foreign.orig/src/dbfopen.c  2005-08-19 18:54:06.000000000 +0900
> +++ foreign/src/dbfopen.c       2005-08-19 18:58:06.000000000 +0900
> @@ -970,7 +970,8 @@
>              || psDBF->pachFieldType[iField] == 'F' )
>        /* || psDBF->pachFieldType[iField] == 'D' ) D is Date */
>     {
> -       if( psDBF->panFieldDecimals[iField] > 0 )
> +       if( psDBF->panFieldDecimals[iField] > 0 ||
> +               psDBF->panFieldSize[iField] > 9 )
>            return( FTDouble );
>        else
>            return( FTInteger );
>
> After adopting the patch, read.dbf works correctly.
>
>> cbind(read.csv("test.csv"),read.dbf("test.dbf"))
>       KEYCODE     KEYCODE
> 1    422010010   422010010
> 2  42201002101 42201002101
> 3  42201002102 42201002102
> 4  42201002103 42201002103
> 5  42201002104 42201002104
> 6    422010060   422010060
> 7    422010071   422010071
> 8    422010072   422010072
> 9  42201008001 42201008001
> 10 42201008002 42201008002
>
> --
> Susumu Tanimura
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list