[R] Summary: Unexpected result of read.dbf
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Aug 19 18:08:36 CEST 2005
It really isn't clear that this is correct. The reason is correct:
read.dbf treats numeric files with no decimals as integers, and that _is_
as stated on the help page. So it is definitely not a `bug', and reading
the help would have shown the reason for the original question.
[I in general do not reply to questions that can be answered from the help
page.]
I believe this field has been incorrectly coded as numeric, as it seems to
be a factor ('keycode'). In particular, 19 is not a valid field size for
a numeric field.
If one wants to allow this, I think we have to use double for a field in
which any value is not representable as an integer, and not just if the
field size exceeds 9. I have been working on implementing that.
On Fri, 19 Aug 2005, Susumu Tanimura wrote:
> Hi there,
>
> This is summary and patch for a bug in read.dbf, demonstrating in
> Message-Id: <20050818150446.697835cb.stanimura-ngs at umin.ac.jp>.
>
> After consulting Rjpwiki, a cyber-community of R user in Japan, the
> cause was found, and the patch of solution was proposed.
>
> Overflowing occurs when we use read.dbf for reading a dbf file having
> a field of longer signed integer. For example,
>
> $ dbf2txt test.dbf
> #KEYCODE
> 422010010
> 42201002101
> 42201002102
> 42201002103
> 42201002104
> 422010060
> 422010071
> 422010072
> 42201008001
> 42201008002
>
> The KEYCODE field is numeric type, 19 digits, and no decimal. You can
> create this file with OpenOffice.org Calc, txt2dbf, and so on. You
> also prepare a file of CSV format.
>
>> library(foreign)
> > cbind(read.csv("test.csv"),read.dbf("test.dbf"))
> KEYCODE KEYCODE
> 1 422010010 422010010
> 2 42201002101 NA
> 3 42201002102 NA
> 4 42201002103 NA
> 5 42201002104 NA
> 6 422010060 422010060
> 7 422010071 422010071
> 8 422010072 422010072
> 9 42201008001 NA
> 10 42201008002 NA
>
> This is not reproducible when the field has decimals like numeric
> type, 19 digits, and 5 decimals.
>
> The patch written of Mr. Eiji Nakama is followed.
>
> --- foreign.orig/src/dbfopen.c 2005-08-19 18:54:06.000000000 +0900
> +++ foreign/src/dbfopen.c 2005-08-19 18:58:06.000000000 +0900
> @@ -970,7 +970,8 @@
> || psDBF->pachFieldType[iField] == 'F' )
> /* || psDBF->pachFieldType[iField] == 'D' ) D is Date */
> {
> - if( psDBF->panFieldDecimals[iField] > 0 )
> + if( psDBF->panFieldDecimals[iField] > 0 ||
> + psDBF->panFieldSize[iField] > 9 )
> return( FTDouble );
> else
> return( FTInteger );
>
> After adopting the patch, read.dbf works correctly.
>
>> cbind(read.csv("test.csv"),read.dbf("test.dbf"))
> KEYCODE KEYCODE
> 1 422010010 422010010
> 2 42201002101 42201002101
> 3 42201002102 42201002102
> 4 42201002103 42201002103
> 5 42201002104 42201002104
> 6 422010060 422010060
> 7 422010071 422010071
> 8 422010072 422010072
> 9 42201008001 42201008001
> 10 42201008002 42201008002
>
> --
> Susumu Tanimura
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list