[R] Possible bug in foreign library import of Stata datasets
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Wed Apr 28 10:14:43 CEST 2004
On 28-Apr-04 Paul Johnson wrote:
> The negative valued observations get mixed up in R:
>
> > library(foreign)
> > dat2 <- read.dta("table2.dta")
> > table(deml)
> deml
> 0 1 2 3 4 5 6 7 8 9 10 246 247
> 94 103 169 108 404 634 154 281 923 258 2352 826 3829
> 248 249 250 251 252 253 254 255
> 2161 6847 541 451 152 306 145 252
>
> The read.dta has translated the negative values as (256-deml).
>
> Is this the kind of thing that is a bug, or have I missed something in
> the documentation about the handling of negative numbers? Should a
> formal bug report be filed?
This observation suggests a fairly clear diagnostic: the original
negative numbers (tabulated as "-10.00" etc) are coming through
as what C would call "signed char" -- positive for N=0 to 127,
negative (N-256) for N=128 to 255, but are being interpreted as
positive integers in (0,255). An unusual though feasible type.
The question is where this is occurring. The Stata tabulation
represents them as apparent reals; but the storage in the .dta file
may be 1-byte for economy of space. If so, then whether or not this
is a bug in read.dta may depend on whether the .dta file includes a
"flag" for such 1-byte data that they really are intended to represent
signed values (and possibly on whether there is a further flag for
real versus integer types). If not, then 1-byte data will not be
distinguishable from unsigned short integers, and read.dta can
hardly be blamed for getting the wrong impression.
Since I'm not familiar with Stata data file formats, I can't
comment further!
Ted.
More information about the R-help
mailing list