[R] Umlaut read from csv-file

Fri Nov 7 15:36:15 CET 2008

At 13:34 07.11.2008, Peter Dalgaard wrote:
>Heinz Tuechler wrote:
> > Dear Prof.Ripley!
> >
> > Thank you very much for your attention. In the given example Encoding(),
> > or the encoding parameter of read.csv solve the problem. I hope your
> > patch will solve also the problem, when I read a spss file by
> > spss.get(), since this function has no encoding parameter and my real
> > problem originated there.
>
>read.spss() (package foreign) does have a reencode argument, though; and
>  this is called by spss.get(), so it looks like an easy hack to add it
>there.

Thank you, that means, I have to change spss.get 
to make it accept the reencode argument and pass 
it to read.spss. At the moment I prefer to step 
back to R 2.7.2 and to wait for a more general 
solution, because to me, there seem to be still strange effects of encoding.

In the following example the encoding gets lost 
by dumping and rereading, even if I use the 
encoding parameter of source(). But may be, I 
don't understand what this parameter should do.

Heinz Tüchler

us <- c("a", "b", "c", "ä", "ö", "ü")
Encoding(us)
[1] "unknown" "unknown" "unknown" "latin1"  "latin1"  "latin1"
dump('us', 'us_dump.txt')
rm(us)
source('us_dump.txt', encoding='latin1')
us
[1] "a" "b" "c" "ä" "ö" "ü"
Encoding(us)
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
unlink('us_dump.txt')

>--
>    O__  ---- Peter Dalgaard             Ã˜ster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907