[Rd] encoding question again

Simon Urbanek simon.urbanek at r-project.org
Sat Dec 29 17:11:56 CET 2007


Hallo Matthias,

On Dec 27, 2007, at 3:52 PM, Matthias Wendel wrote:

> Hi, simon,
> 	i followed your advice by adding/changing the lines
>   abt = iconv(abt,"utf-8","latin1")
>   zz = file( paste("Itemtabelle/Itemtabelle", abt, ".html"), "wt",  
> encoding = "latin1")
> but this yielded the same results.

Ich habe endlich eine Windows-Maschine zum Testen und bei mir wird der  
Dateiname richtig angelegt ...

Dennoch, anscheinend stimmt die locale nicht - denn JGR benutzt immer  
UTF-8,  aber das System liefert CP1252. Deswegen scheint die  
automatische Konvertierung nicht zu funktionieren  
(file(...,encoding..)). Was allerding immer geht, ist die explizite  
Konvertierung:

a=file("foo","wt")
writeLines(iconv(..., "utf-8","latin1"),a)
close(a)

(FWIW: da die empfohlene Kodierung von Webseiten sowieso UTF-8 ist,  
braucht man es eigentlich nicht wirklich ... ;))

charToRaw ist immer eine guter Test, weil UTF-8 fuer Umlaute meist 2- 
bytes bracht und latin1 nur eins.

Viele Gruesse,
Simon


> -----Ursprüngliche Nachricht-----
> Von: Simon Urbanek [mailto:simon.urbanek at r-project.org]
> Gesendet: Donnerstag, 27. Dezember 2007 21:40
> An: Matthias Wendel
> Cc: r-devel at r-project.org
> Betreff: Re: [Rd] encoding question again
>
> Matthias,
>
> you get exactly what you specified - namely UTF-8. If you want your  
> html file to be latin1, then you have to say so:
>
> zz = file( paste("Itemtabelle/Itemtabelle", abt, ".html"), "wt",  
> encoding = "latin1")
>
> In addition, you're assuming that `abt' is in the correct encoding  
> to be understood by your OS. If it's not, you better convert it into  
> one.
> From your results it seems as if `abt' is also UTF-8 encoded. Since  
> you didn't tell us where you got that from, you should either fix  
> the source or use something like iconv(abt,"utf-8","latin1"):
>
> (in UTF-8 locale)
>> abt="nür"
>> cat(abt,"\n")
> nür
>> charToRaw(abt)
> [1] 6e c3 bc 72
>> charToRaw(iconv(abt,"utf-8","latin1"))
> [1] 6e fc 72
>
> Cheers,
> Simon
>
>
> On Dec 27, 2007, at 3:11 PM, Matthias Wendel wrote:
>
>> Hi, R Devils,
>> I'm running the actual R version in JGR (version 1.5-8 ).
>> Sys.getlocale(category = "LC_ALL") yields [1]
>> "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.
>> 1252;LC_MONETARY=German_Germany.
>> 1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252"
>>
>> I want to write some HTML-Code enhanced by statistical results and
>> labels encoded in Latin-1, which I pass to a function. Some label
>> shall generate the filename. Although the labels are correctly  
>> handled
>> in JGR they are somehow converted when they are written to the file.
>> Also the filename is not constructed as wanted. The function
>> definition is correctly sourced into R. The function is defined like
>> this:
>>
>> Itemtabelle.head <- function (abt ){
>>  # nür zöm TÄST
>>  zz = file( paste("Itemtabelle/Itemtabelle", abt, ".html"), "wt",
>> encoding = "UTF-8")
>>  cat(as.character("<html
>> xmlns:o=\"urn:schemas-microsoft-com:office:office
>> \" xmlns:x=\"urn:schemas-microsoft-com:office:excel\"
>> xmlns=\"http://www.w3.org/TR/REC-html40
>> \">  \n"),
>>      as.character("
>> <
>> head
>>>
>>
>> \n "),
>> 		.
>> 		.
>> 		.
>>      as.character("        <td colspan=5 class=xl28 width=727 style=
>> \'width:545pt\'>Gesundheitsindikatoren:  "), abt, as.character("</
>> td>                                   \n"),
>>      as.character("       </
>> tr
>>>
>>
>> "), file  = zz)
>>      close(zz)
>>      unlink(zz)
>> }
>> Setting abt as " Ärzte Innere, Gynäkologie" and calling the function
>> with this argument, yields a filename "Itemtabelle  Ärzte Innere,
>> Gynäkologie .html" and in the file a line
>>        <td colspan=5 class=xl28 width=727 style='width:
>> 545pt'>Gesundheitsindikatoren:    Ärzte Innere, Gynäkologie </
>> td>
>> is generated.                                 .
>> I tried to solve this by using iconv, without success.
>> The problem remains the same in the rgui and rterm - in rterm the
>> resulting filename is "Itemtabelle Žrzte Innere, Gyn„kologie  .html".
>>
>> Cheers,
>> Matthias
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>



More information about the R-devel mailing list