[R] read.spss and umlaut
Thomas Lumley
tlumley at u.washington.edu
Wed Aug 2 17:11:13 CEST 2006
On Wed, 2 Aug 2006, Thomas Kuster wrote:
> Hello
>
> When I read a SPSS *.por file with read.spss everything after a umlaut is
> missing:
This sounds like a conflict between encodings -- eg if R is assuming UTF-8
and the file is encoding in Latin-1 then the sequence
U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
U+0072 : LATIN SMALL LETTER R
is coded as FC72 in the file, which is an illegal byte sequence in UTF-8.
The underlying C code (being written in the US quite a long time ago)
doesn't know about encodings, and I don't know what the rules are in SPSS
for valid characters (I suspect that in these old portable file formats it
probably just reads and writes bytes, leaving it up to the OS to interpret
them.
You could try running R in a non-UTF-8 locale to see if it helps.
If anyone has definitive information about how SPSS represents strings and
decides on valid characters that might be useful too.
-thomas
>> library("foreign")
>> spssdaten <- read.spss("projets.por")
>> attr(spssdaten$PROJETX, "value.labels")[1:20]
> Bg Stammzellenforschung Bb
> 863 862
> Bb Neugestaltung des Finanzausgleichs
> 861 854
> EV Postdienste f Bb
> 853 852
> Bb Bg Steuerpaket
> 851 843
> Bb Anhebung der Mehrwertsteuer s 11. AHV-Revision
> 842 841
> Volkinitiative Lebenslange Verwahrung
> 833 832
> Gegenentwurf zur Avanti EV Lehrstellen-Initiative
> 831 824
> EV Moratorium Plus EV Strom ohne Atom
> 823 822
> EV Ja zu fairen Mieten EV Gleiche Rechte f
> 821 815
> EV Gesundheitsinitiative EV Sonntags-Initiative
> 814 813
>
> The SPSS-File is okay:
>> system("cat projets.por |grep Postdienste")
> echtserwerb 3. GenerationSD/N/EV Postdienste für alleSE/16/Änderrung Bg EOG
> Mut
>
> How can I read the SPSS-File with the Umlaut?
>
> Bye
> Thomas Kuster
>
> R: 2.1.0 (2005-04-18)
> OS: Debian Linux, 2.6.10-isgee-neptun-1
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list