[R] read.spss and encodings

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Thu Feb 1 13:52:48 CET 2007


Hi!

I'm having trouble with importing spss files containing non-ascii characters 
(R 2.4.1, debian linux, i386). To reproduce:

Download the following file: 
http://statmath.wu-wien.ac.at/data/spss/de/comphomeneu.sav

require (foreign)
Sys.setlocale (locale="C")
read.spss("comphomeneu.sav")$ARBEIT[1]
# prints:
# [1] im B\374ro
# Levels: im B\374ro zuhause

\374 of course is actually a u-umlaut. However, I guess in the C locale it's 
not expected to print as such. But now try this (use any UTF-8 locale you may 
have installed):

Sys.setlocale (locale="de_DE.UTF-8")
read.spss("comphomeneu.sav")$ARBEIT[1]
# prints:
# [1]Error in print.default(xx, quote = quote, ...) :
#        invalid multibyte string

To me it looks, like read.spss () would probably need an encoding parameter, 
and / or some iconv () magic. Now, locale conversion always makes my head 
spin, so I thought I'd better post here, before calling this to be a bug in 
R. Two questions:

1) Is there some way to work around this, i.e. make sure it is converted to 
proper UTF-8 while importing? Am I missing something obvious?
2) Should I submit this as a bug report?

Thanks!
Thomas Friedrichsmeier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20070201/e419601f/attachment.bin 


More information about the R-help mailing list