[R] read.spss and umlaut

Thomas Lumley tlumley at u.washington.edu
Wed Aug 2 17:11:13 CEST 2006


On Wed, 2 Aug 2006, Thomas Kuster wrote:

> Hello
>
> When I read a SPSS *.por file with read.spss everything after a umlaut is
> missing:

This sounds like a conflict between encodings -- eg if R is assuming UTF-8 
and the file is encoding in Latin-1 then the sequence
U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
U+0072 : LATIN SMALL LETTER R
is coded as FC72 in the file, which is an illegal byte sequence in UTF-8.

The underlying C code (being written in the US quite a long time ago) 
doesn't know about encodings, and I don't know what the rules are in SPSS 
for valid characters (I suspect that in these old portable file formats it 
probably just reads and writes bytes, leaving it up to the OS to interpret 
them.

You could try running R in a non-UTF-8 locale to see if it helps.

If anyone has definitive information about how SPSS represents strings and 
decides on valid characters that might be useful too.

 	-thomas

>> library("foreign")
>> spssdaten <- read.spss("projets.por")
>> attr(spssdaten$PROJETX, "value.labels")[1:20]
>              Bg Stammzellenforschung                                  Bb
>                                  863                                   862
> Bb Neugestaltung des Finanzausgleichs
>                                  861                                   854
>                     EV Postdienste f                                   Bb
>                                  853                                   852
>                                  Bb                         Bg Steuerpaket
>                                  851                                   843
>     Bb Anhebung der Mehrwertsteuer s                      11. AHV-Revision
>                                  842                                   841
> Volkinitiative Lebenslange Verwahrung
>                                  833                                   832
>              Gegenentwurf zur Avanti             EV Lehrstellen-Initiative
>                                  831                                   824
>                   EV Moratorium Plus                    EV Strom ohne Atom
>                                  823                                   822
>               EV Ja zu fairen Mieten                   EV Gleiche Rechte f
>                                  821                                   815
>             EV Gesundheitsinitiative                EV Sonntags-Initiative
>                                  814                                   813
>
> The SPSS-File is okay:
>> system("cat projets.por |grep Postdienste")
> echtserwerb 3. GenerationSD/N/EV Postdienste für alleSE/16/Änderrung Bg  EOG
> Mut
>
> How can I read the SPSS-File with the Umlaut?
>
> Bye
> Thomas Kuster
>
> R: 2.1.0 (2005-04-18)
> OS: Debian Linux, 2.6.10-isgee-neptun-1
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle


More information about the R-help mailing list