[R] SPSS data import: problems & work arounds for GSS surveys
John Fox
jfox at mcmaster.ca
Tue Mar 3 14:43:37 CET 2009
Dear Paul,
I encountered this problem the other day, and it went away when I updated
the foreign package from version 0.8-32 to 0.8-33.
I hope this helps,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Paul Johnson
> Sent: March-02-09 10:58 PM
> To: R-help
> Subject: [R] SPSS data import: problems & work arounds for GSS surveys
>
> I'm using R 2.8.1 on Ubuntu 8.10. I'm writing partly to ask what's
> wrong, partly to tell other users who search that there is a work
> around.
>
> The General Social Survey is a long standing series of surveys
> provided by NORC (National Opinion Research Center). I have
> downloaded some years of the survey data in SPSS format (here's the
> site: http://www.norc.org/GSS+Website/Download/SPSS+Format/). When I
> try to import using foreign, I get an error like so:
>
> > library(foreign)
> > dat <- read.spss("gss2006.sav", to.data.frame=T, trim.factor.names=T)
> Error in inherits(x, "factor") : object "cp" not found
> In addition: Warning messages:
> 1: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 99.9 for variable
> TVRELIG
> 2: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 99.9 for variable
SEI
> 3: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 99.9 for
> variable FIRSTSEI
> 4: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 99.9 for variable
> PASEI
> 5: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 99.9 for variable
> MASEI
> 6: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 99.9 for variable
> SPSEI
> 7: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File contains duplicate label for value 0.75 for
> variable YEARSJOB
> 8: In read.spss("gss2006.sav", to.data.frame = T, trim.factor.names = T) :
> gss2006.sav: File-indicated character representation code (1252)
> looks like a Windows codepage
>
> No dat object is created from this.
>
>
> I have found a work around. I installed PSPP version 0.6.0 and used
> it to open the sav file, and then re-save it in SPSS sav format.
> That creates an SPSS file that foreign's function can open.
>
> I still see the warnings about redundant value labels, but as far as I
> can see these are harmless. A working object is obtained like so:
>
> > dat <- read.spss("gss-pspp.sav")
> Warning messages:
> 1: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 99.9 for
> variable TVRELIG
> 2: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 0.75 for
> variable YEARSJOB
> 3: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
SEI
> 4: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 99.9 for
> variable FIRSTSEI
> 5: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
> PASEI
> 6: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
> MASEI
> 7: In read.spss("gss-pspp.sav") :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
> SPSEI
>
>
> There is still some trouble with the importation of this SPSS file,
> however. It has the symptoms of being a non-rectangular data array, I
> think. What do you think about these warnings:
>
> > dat <- read.spss("gss-pspp.sav",to.data.frame=T)
> There were 22 warnings (use warnings() to see them)
> > warnings()
> Warning messages:
> 1: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 99.9 for
> variable TVRELIG
> 2: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 0.75 for
> variable YEARSJOB
> 3: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
SEI
> 4: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 99.9 for
> variable FIRSTSEI
> 5: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
> PASEI
> 6: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
> MASEI
> 7: In read.spss("gss-pspp.sav", to.data.frame = T) :
> gss-pspp.sav: File contains duplicate label for value 99.9 for variable
> SPSEI
> 8: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 9: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 10: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 11: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 12: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 13: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 14: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 15: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 16: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 17: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 18: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 19: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 20: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 21: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
> 22: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] :
> longer object length is not a multiple of shorter object length
>
>
> While puzzling over this, I have tested the SPSS functions in the
> package memisc. This has some truly handy features! Read ?importer
> and you'll see it can generate a list of variables as well as a
> codebook. It can also handle an SPSS portable file.
> Importer works a little bit like SPSS, actually, because the metadata
> is accessed, but the data is not really loaded until later (as far as
> I can tell, one must run either subset or as.data.set to force the
> actual data read). One can generate the description and codebook
> without accessing the data.
>
> > idat <- spss.system.file("gss2006.sav")
> > show(idat)
>
> SPSS system file 'gss2006.sav'
> with 5137 variables and 4510 observations
>
> A subset function can access the particular variables from the data.
>
>
> > idat2 <- subset(idat, select=c(gunlaw))
> > idat2
>
> Data set with 4510 observations and 1 variables
>
> gunlaw
> 1 OPPOSE
> 2 *NAP
> 3 *NAP
> 4 FAVOR
> 5 FAVOR
> 6 *NAP
> 7 FAVOR
> 8 *NAP
> 9 FAVOR
> 10 FAVOR
> 11 FAVOR
> 12 FAVOR
> 13 FAVOR
> 14 *NAP
> 15 *NAP
> 16 *NAP
> 17 FAVOR
> 18 *NAP
> 19 FAVOR
> 20 *NAP
> 21 *NAP
> 22 OPPOSE
> 23 *NAP
> 24 *NAP
> 25 *NAP
> .. ......
> (25 of 4510 observations shown)
>
> and the function "as.data.set" will force a full read of all the data
> columns:
>
>
> > idat3 <- as.data.set(idat)
> >
>
> > table(idat3$gunlaw, idat2$gunlaw)
>
> 0 1 2 8 9
> 0 2507 0 0 0 0
> 1 0 1568 0 0 0
> 2 0 0 395 0 0
> 8 0 0 0 35 0
> 9 0 0 0 0 5
>
>
> So, in conclusion, I've found troubles with read.spss in foreign, but
> have been able to work around that by accessing data with PSPP or the
> functions from the memisc package. The only advantage of using the
> PSPS program (its GUI is psppire) is that you can see the data in a
> rectangular spreadsheet that is more-or-less searchable. It has that
> same hard-to-use interface pioneered at SPSS (it hides variable names
> and displays descriptions in choosers). But the rectangular display in
> PSPP is nice.
>
> pj
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list