[R] read.spss and encodings
Thomas Friedrichsmeier
thomas.friedrichsmeier at ruhr-uni-bochum.de
Thu Feb 1 13:52:48 CET 2007
Hi!
I'm having trouble with importing spss files containing non-ascii characters
(R 2.4.1, debian linux, i386). To reproduce:
Download the following file:
http://statmath.wu-wien.ac.at/data/spss/de/comphomeneu.sav
require (foreign)
Sys.setlocale (locale="C")
read.spss("comphomeneu.sav")$ARBEIT[1]
# prints:
# [1] im B\374ro
# Levels: im B\374ro zuhause
\374 of course is actually a u-umlaut. However, I guess in the C locale it's
not expected to print as such. But now try this (use any UTF-8 locale you may
have installed):
Sys.setlocale (locale="de_DE.UTF-8")
read.spss("comphomeneu.sav")$ARBEIT[1]
# prints:
# [1]Error in print.default(xx, quote = quote, ...) :
# invalid multibyte string
To me it looks, like read.spss () would probably need an encoding parameter,
and / or some iconv () magic. Now, locale conversion always makes my head
spin, so I thought I'd better post here, before calling this to be a bug in
R. Two questions:
1) Is there some way to work around this, i.e. make sure it is converted to
proper UTF-8 while importing? Am I missing something obvious?
2) Should I submit this as a bug report?
Thanks!
Thomas Friedrichsmeier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20070201/e419601f/attachment.bin
More information about the R-help
mailing list