[Rd] foreign generates bad Stata data files (PR#13820)
peter.muhlberger at gmail.com
peter.muhlberger at gmail.com
Sat Jul 11 22:05:11 CEST 2009
Full_Name: peter muhlberger
Version: 2.7.1
OS: Ubuntu x86_64 dual core
Submission from: (NULL) (70.238.206.13)
I've spent half a day generating .dta files using write.dta only to have them
crash my copy of Stata. I eventually discovered that removing a string variable
with a maximum observed length of 280 characters allows Stata to read the file
without problems. A Stata limit is that the length of a string variable cannot
exceed 244. R gives no warning about this problem. I assume it does not
abbreviate either. The following code creates a .dta file that causes my copy
of Stata to suddenly disappear when I try to open the file:
x=c("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
x=as.data.frame(x, stringsAsFactors=F)
names(x)[1]="x"
write.dta(x, version = 10, file="/home/peterm/Desktop/x.dta")
That's the main problem I wanted to report. There are several others you might
want to look into. Above, leave out names(x)[1]="x" and take a look at what the
name of the variable is in the dataframe--it's a line of code. Also, write.dta
is supposed to turn factors into variable labels in Stata, but I get no variable
labels in Stata (starting w/ data that has factors). Finally, when I try to
read an spss dataset created by Sawtooth into R using read.spss, I get a
multitude of variables that aren't in the original dataset and have nothing in
them.
Below, my sessionInfo:
> sessionInfo()
R version 2.7.1 (2008-06-23)
x86_64-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] foreign_0.8-36 JGR_1.6-7 iplots_1.1-3 JavaGD_0.5-2 rJava_0.6-3
boot_1.2-37
Hope this helps!
Cheers,
Peter
More information about the R-devel
mailing list