[R] read in Stata and SPSS with value labels/formats
Frank Harrell
f.harrell at vanderbilt.edu
Thu Jan 19 23:45:46 CET 2012
require(Hmisc)
?spss.get
Xu Jun wrote
>
> Sorry I forgot the subject line last time
>
> Dear R experts,
>
> I am using the foreign package to read in Stata and SPSS format data
> files (same data but I tried different format). I first tried using
> read.dta for the Stata format:
>
>
> ##########################
>> library(foreign)
>> mystata <- read.dta("data/hlthintl.dta", convert.factor=FALSE)
> Error in read.dta("data/hlthintl.dta", convert.factor = FALSE) :
> a binary read error occurred
> ##########################
>
> Then I tried saving this Stata file to an old version without labels in
> Stata
> ************************************
> use "data\hlthintl.dta", clear
> saveold "data\hlthintlold.dta", nolabel
> ************************************
>
> Then I read the hlthintlold.dta into R without problems, but of course
> without value labels. Well, to keep these value labels, I turned to
> SPSS. Here is what I did and got:
>
> #########################
>>myspss <- read.spss("data/hlthintl.sav", use.value.labels=TRUE,
max.value.labels=Inf, to.data.frame=TRUE)
> There were 50 or more warnings (use warnings() to see the first 50)
>> warnings()
> Warning messages:
> 1: In read.spss("data/hlthintl.sav", ... :
> data/hlthintl.sav: File contains duplicate label for value 276.2 for
> variable V4
> 2: In read.spss("data/hlthintl.sav", ... :
> data/hlthintl.sav: File contains duplicate label for value 376.2 for
> variable V4
> 3: In read.spss("data/hlthintl.sav", ... :
> data/hlthintl.sav: File contains duplicate label for value 826.2 for
> variable V4
> 4: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 5....
> 6....
> ...
> ...
> 50.....
> ########################
>
>
> Warnings 5-50 are the same as warning 4. Now I can have most data
> transferred into the R system correctly except when I check an
> occupation variable, it lost all its numeric coding (frequencies are
> all zero)
>
>
> ########################
>> table(myspss$occupation)
>
> ARMED FORCES
> 0
> Soldiers
> 0
> Officers
> 0
> ...
> ...
> ...
> ...
>
> Hand packers and other manufacturing labourers
> 0
> TRANSPORT LABOURERS AND FREIGHT HANDLERS
> 0
> Hand or pedal vehicle drivers
> 0
> Drivers of animal-drawn vehicles and machinery
> 0
> Freight handlers
> 0
> Refused
> 0
> Dont know
> 0
> Warning message:
> In `levels<-`(`*tmp*`, value = c("ARMED FORCES", "Soldiers", "Officers",
> :
> duplicated levels will not be allowed in factors anymore
> ########################################
>
> Any thoughts or suggestions? Thanks a lot!
>
> Jun Xu, PhD
> Assistant Professor
> Department of Sociology
> Ball State University
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/read-in-Stata-and-SPSS-with-value-labels-formats-tp4311210p4311751.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list