[R] Possible bug in foreign library import of Stata datasets
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Apr 28 10:04:11 CEST 2004
Paul Johnson <pauljohn at ku.edu> writes:
> Concerning this article, Christopher Zorn, "Generalized Estimating
> Equation Models for Correlated Data: A Review with Applications."
> 2001. American Journal of Political Science 45(April):470-90.
>
> The author very kindly provides data for replication on his web page:
> http://www.emory.edu/POLS/zorn/Data/GEE.zip.
>
> I've been comparing the Professor Zorn's results obtained with Stata
> and R. I ran into some trouble with the results in Table 2. I traced
> the problem back to the R foreign library's data import. Observe the
> variable "deml" in the Stata output:
>
>
> table deml
>
> ----------------------
> Lower of |
> two |
> POLITY |
> democracy |
> s | Freq.
> ----------+-----------
> -10.00 | 826
> -9.00 | 3,829
> -8.00 | 2,161
> -7.00 | 6,847
> -6.00 | 541
> -5.00 | 451
> -4.00 | 152
> -3.00 | 306
> -2.00 | 145
> -1.00 | 252
> 0.00 | 94
> 1.00 | 103
> 2.00 | 169
> 3.00 | 108
> 4.00 | 404
> 5.00 | 634
> 6.00 | 154
> 7.00 | 281
> 8.00 | 923
> 9.00 | 258
> 10.00 | 2,352
> ----------------------
>
>
> The negative valued observations get mixed up in R:
>
> > library(foreign)
> > dat2 <- read.dta("table2.dta")
> > table(deml)
> deml
> 0 1 2 3 4 5 6 7 8 9 10 246 247
> 94 103 169 108 404 634 154 281 923 258 2352 826 3829
> 248 249 250 251 252 253 254 255
> 2161 6847 541 451 152 306 145 252
>
> The read.dta has translated the negative values as (256-deml).
>
> Is this the kind of thing that is a bug, or have I missed something in
> the documentation about the handling of negative numbers? Should a
> formal bug report be filed?
Looks like a classic signed/unsigned confusion. Negative numbers
stored in ones-complement format in single bytes, but getting
interpreted as unsigned. A bug report could be a good idea if the
resident Stata expert (Thomas, I believe) is unavailable just now.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list