[R] read.spss, locale and encodings
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Apr 8 16:17:51 CEST 2009
Hans Ekbrand wrote:
> On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote:
>> Hans Ekbrand wrote:
>>> I must be missing something obvious here:
>>>
>>> According to the help page for read.spss, the reencode option is only
>>> active when R is run under a UTF-8 locale.
>> Not in my version:
>>
>> reencode: logical: should character strings be re-encoded to the
>> current locale. The default, 'NA', means to do so in a UTF-8
>> locale, only. Alternatively character, specifying an
>> encoding to assume.
>
> OK, thanks for that correction, but the problem isn't solved, since
> read.spss fails, see below. When read.spss succeeds, the options is
> not useful, since then the current locale is iso88591(5).
>
>> So, does it help with reencode="Latin1"? Presumably this comes from
>> assuming UTF-8 when it isn't.
>
>> Sys.getlocale()
> [1] "LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C"
>> test <- read.spss("wo.sav", to.data.frame=TRUE, reencode="Latin1")
> Error in read.spss("wo.sav", to.data.frame = TRUE, reencode = "Latin1") :
> error reading system-file header
> In addition: Warning message:
> In read.spss("wo.sav", to.data.frame = TRUE, reencode = "Latin1") :
> wo.sav: position 143: Variable name begins with invalid character
>
> Using another version of the dataset, where I have successfully
> encoded the names to UTF-8, here is the problematic variable name:
>
> names(Workorientation.2005.Swe)[143]
> [1] "KÖN1"
>
>> 8.34 is used in the current prerelease. AFAIR, some issues with
>> encodings were fixed recently.
>
> Someone running foreign 8.34 that is willing to test my SPSS-file?
Someone with an SPSS file problem willing to help test the prereleases? :-)
You could start by placing it somewhere accessible...
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list