[R] Possible bug in foreign library import of Stata datasets

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Apr 28 10:04:11 CEST 2004


Paul Johnson <pauljohn at ku.edu> writes:

> Concerning this article, Christopher Zorn, "Generalized Estimating
> Equation Models for Correlated Data: A Review with Applications."
> 2001. American Journal of Political Science 45(April):470-90.
> 
> The author very kindly provides data for replication on his web page:
> http://www.emory.edu/POLS/zorn/Data/GEE.zip.
> 
>   I've been comparing the Professor Zorn's results obtained with Stata
> and R.  I ran into some trouble with the results in Table 2.  I traced
> the problem back to the R foreign library's data import.  Observe the
> variable "deml" in the Stata output:
> 
> 
> table deml
> 
> ----------------------
> Lower of  |
> two       |
> POLITY    |
> democracy |
> s         |      Freq.
> ----------+-----------
>     -10.00 |        826
>      -9.00 |      3,829
>      -8.00 |      2,161
>      -7.00 |      6,847
>      -6.00 |        541
>      -5.00 |        451
>      -4.00 |        152
>      -3.00 |        306
>      -2.00 |        145
>      -1.00 |        252
>       0.00 |         94
>       1.00 |        103
>       2.00 |        169
>       3.00 |        108
>       4.00 |        404
>       5.00 |        634
>       6.00 |        154
>       7.00 |        281
>       8.00 |        923
>       9.00 |        258
>      10.00 |      2,352
> ----------------------
> 
> 
> The negative valued observations get mixed up in R:
> 
>  > library(foreign)
>  > dat2 <- read.dta("table2.dta")
>  > table(deml)
> deml
>     0    1    2    3    4    5    6    7    8    9   10  246  247
>    94  103  169  108  404  634  154  281  923  258 2352  826 3829
>   248  249  250 251  252  253  254  255
>   2161 6847  541 451  152  306  145  252
> 
> The read.dta has translated the negative values as (256-deml).
> 
> Is this the kind of thing that is a bug, or have I missed something in
> the documentation about the handling of negative numbers?  Should a
> formal bug report be filed?

Looks like a classic signed/unsigned confusion. Negative numbers
stored in ones-complement format in single bytes, but getting
interpreted as unsigned. A bug report could be a good idea if the
resident Stata expert (Thomas, I believe) is unavailable just now.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list