[R] read in Stata and SPSS with value labels/formats

Xu Jun junxu.r at gmail.com
Thu Jan 19 20:19:55 CET 2012


Sorry I forgot the subject line last time

Dear R experts,

I am using the foreign package to read in Stata and SPSS format data
files (same data but I tried different format). I first tried using
read.dta for the Stata format:


##########################
> library(foreign)
> mystata <- read.dta("data/hlthintl.dta", convert.factor=FALSE)
Error in read.dta("data/hlthintl.dta", convert.factor = FALSE) :
 a binary read error occurred
##########################

Then I tried saving this Stata file to an old version without labels in Stata
************************************
use "data\hlthintl.dta", clear
saveold "data\hlthintlold.dta", nolabel
************************************

Then I read the hlthintlold.dta into R without problems, but of course
without value labels. Well, to keep these value labels, I turned to
SPSS. Here is what I did and got:

#########################
>myspss <- read.spss("data/hlthintl.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In read.spss("data/hlthintl.sav",  ... :
 data/hlthintl.sav: File contains duplicate label for value 276.2 for
variable V4
2: In read.spss("data/hlthintl.sav",  ... :
 data/hlthintl.sav: File contains duplicate label for value 376.2 for
variable V4
3: In read.spss("data/hlthintl.sav",  ... :
 data/hlthintl.sav: File contains duplicate label for value 826.2 for
variable V4
4: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
 longer object length is not a multiple of shorter object length
5....
6....
...
...
50.....
########################


Warnings 5-50 are the same as warning 4. Now I can have most data
transferred into the R system correctly except when I check an
occupation variable, it lost all its numeric coding (frequencies are
all zero)


########################
> table(myspss$occupation)

                                               ARMED FORCES
                                                          0
                                                   Soldiers
                                                          0
                                                   Officers
                                                          0
...
...
...
...

             Hand packers and other manufacturing labourers
                                                          0
                   TRANSPORT LABOURERS AND FREIGHT HANDLERS
                                                          0
                              Hand or pedal vehicle drivers
                                                          0
             Drivers of animal-drawn vehicles and machinery
                                                          0
                                           Freight handlers
                                                          0
                                                    Refused
                                                          0
                                                  Dont know
                                                          0
Warning message:
In `levels<-`(`*tmp*`, value = c("ARMED FORCES", "Soldiers", "Officers",  :
 duplicated levels will not be allowed in factors anymore
########################################

Any thoughts or suggestions? Thanks a lot!

Jun Xu, PhD
Assistant Professor
Department of Sociology
Ball State University



More information about the R-help mailing list