[R] Possible bug in foreign library import of Stata datasets
Paul Johnson
pauljohn at ku.edu
Wed Apr 28 08:16:03 CEST 2004
Concerning this article, Christopher Zorn, "Generalized Estimating
Equation Models for Correlated Data: A Review with Applications." 2001.
American Journal of Political Science 45(April):470-90.
The author very kindly provides data for replication on his web page:
http://www.emory.edu/POLS/zorn/Data/GEE.zip.
I've been comparing the Professor Zorn's results obtained with Stata
and R. I ran into some trouble with the results in Table 2. I traced
the problem back to the R foreign library's data import. Observe the
variable "deml" in the Stata output:
table deml
----------------------
Lower of |
two |
POLITY |
democracy |
s | Freq.
----------+-----------
-10.00 | 826
-9.00 | 3,829
-8.00 | 2,161
-7.00 | 6,847
-6.00 | 541
-5.00 | 451
-4.00 | 152
-3.00 | 306
-2.00 | 145
-1.00 | 252
0.00 | 94
1.00 | 103
2.00 | 169
3.00 | 108
4.00 | 404
5.00 | 634
6.00 | 154
7.00 | 281
8.00 | 923
9.00 | 258
10.00 | 2,352
----------------------
The negative valued observations get mixed up in R:
> library(foreign)
> dat2 <- read.dta("table2.dta")
> table(deml)
deml
0 1 2 3 4 5 6 7 8 9 10 246 247
94 103 169 108 404 634 154 281 923 258 2352 826 3829
248 249 250 251 252 253 254 255
2161 6847 541 451 152 306 145 252
The read.dta has translated the negative values as (256-deml).
Is this the kind of thing that is a bug, or have I missed something in
the documentation about the handling of negative numbers? Should a
formal bug report be filed?
--
Paul E. Johnson email: pauljohn at ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
More information about the R-help
mailing list