[R] WG: AW: Another problem with encoding

Matthias Wendel office at matthiaswendel.de
Wed Jan 2 16:02:16 CET 2008


Hello, Peter,
	I tried it out: iconv(names(attributes(spss[,'Y6'])[[1]][14]), "UTF-8", "LATIN1", sub='byte') yielded 

[1] "<c4>rzte Chirurgie" 

and c4 corresponds in most encodings to Ä. What can I do next? I wonder whether there is a more comfortable way then to change the
occurences of <..> by the adequate character.
Regards,
Matthias

-----Ursprüngliche Nachricht-----
Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
Gesendet: Dienstag, 1. Januar 2008 20:21
An: Matthias Wendel
Betreff: Re: AW: [R] Another problem with encoding

Matthias Wendel wrote:
> Happy new year and my apologies, Peter. Here are the missing facts:
> I'm reading in a spss-file, doing some calculations and putting the 
> results in a xml file. The xml-file is UTF-8 encoded and so should the results and their labels (eg  Ärzte Chirurgie):
> Here is part of the R session:
>
>   
As a matter of principle: Requests for more information are not offers that I will solve your problems personally. Stay on the list!

The characters seem to travel OK in email, so latin1is a guess. Have you tried the sub="byte" argument to iconv()?



>   
>> Sys.getlocale()
>>     
> [1]
>
"LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.125
> 2"
>   
>> spss[,'Y6']
>>     
>   [1]  6  3  8 11  8  9  6  8  3  5 10 15 NA  9  8  3  8 16  6  6 NA 10  5  2  7  7  6 16  7 15  7 10 12
>  [34]  8  7 12 12 16  7  6  8  8 15  6 NA  8 99  7 12  8  9 16  7 16  8  7  7  1 15 12  8  7 10  7  8  7
>  [67]  8  9  8  6  6  8  6 16 11  5 11 11  1 11  3  7  7 10 10 10  6 11 16 NA  1  3  2 10 99 10  3  3  9
> [100]  7 16 99 16  1 10  2 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 NA 10 16 16 NA  6 10  5 11
> [133] 11  1  1  1  1 16  1 16  1  1  1  1  6  6  6 16  8 16 16 16 16  5  6 10 99 11 11 10  6  6  1  1  6
> [166]  1 11 11 16  9 11 16  6  8  8 16 16  8  6 16 16 12 12 12 12 12 12 12 16  9 16 15 12 12 15 10 16 15
> [199]  4  1  2 14  4  4  2  5 NA  1  5  5  7  9  5 12 12 NA 16 12 12 12 12 12 12 12 12 12 99 NA 12 12 NA
> [232]  1 16  1  7 11  5  6  7  1 13  6  8 16  2  1  5 16 16  9  8  8  8  7 16  8  8  2  8  5  4  6 14  5
> [265] 14  8  8 14  4  4  8 14  8 14  6  2  3 14  3 16  5 15 15 15 15 15 15 15 15 15 15 15 13 13 13 13 13
> [298] 13 13 13 13 13 13 13 13 15  6 NA 12  3  9  9 NA 10 16
> attr(,"value.labels")
>                           Verwaltung Servicegesellschaft Waldfriede (SKW) 
>                                   16                                   15 
>            Kurzzeitpflege Waldfriede                        Sozialstation 
>                                   14                                   13 
>                  Krankenpflegeschule              Med. Technischer Dienst 
>                                   12                                   11 
>                            Pflege OP                      Funktionsdienst 
>                                   10                                    9 
>                   Pflege Gynäkologie                     Pflege Chirurgie 
>                                    8                                    7 
>                        Pflege Innere            Ärzte Anästhesie, Röntgen 
>                                    6                                    5 
>                    Ärzte Gynäkologie                      Ärzte Chirurgie 
>                                    4                                    3 
>                         Ärzte Innere         Patientenberatung/-betreuung 
>                                    2                                    1 
>   
>> names(attributes(spss[,'Y6'])[[1]][14])
>>     
> [1] "Ärzte Chirurgie"
>   
>> iconv(names(attributes(spss[,'Y6'])[[1]][14]), "UTF-8", "LATIN1")
>>     
> [1] NA
>   
>> utf8ToInt(names(attributes(spss[,'Y6'])[[1]][14]))
>>     
> Fehler in utf8ToInt(names(attributes(spss[, "Y6"])[[1]][14])) : 
>   invalid UTF-8 string
>   
>
> Cheers,
> Matthias
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
> Gesendet: Montag, 31. Dezember 2007 10:45
> An: Matthias Wendel
> Cc: r-help at stat.math.ethz.ch
> Betreff: Re: [R] Another problem with encoding
>
> Matthias Wendel wrote:
>   
>> Hi
>>     I've imported an spss-file using read.spss. One variable has value 
>> like 'Ärzte'. I thought this is UTF-8 encoded, but it is not (as the results of iconv and utf8ToInt suggest). Is there any way to
>>     
> find out how these spss-values are encoded?
>   
>>   
>>     
> You are assuming a bit much of your readers.
>
> What exactly are you doing? Is it a value, a value label, or perhaps a variable name. How do the results of read.spss look on the
R
> side? How did you apply iconv and utf8ToInt? What is your locale?
>
> I mean, we could try and guess all those details, but you are the one with the hard info, and the motivation...
>
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list