[R] Umlaut read from csv-file

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Nov 6 23:51:03 CET 2008


Look at Encoding() on your two strings.  The results are different, and 
this seems to be the root of the problem.  Adding encoding="latin1" to the 
read.csv call is a workaround.

It looks like there is a problem in the use of the CHARSXP cache: if I 
save the session then x0 == x becomes true when I reload it, even though 
the encodings remain different.

I've found the immediate cause and will change this in R-patched shortly.

On Thu, 6 Nov 2008, Heinz Tuechler wrote:

> Dear All!
>
> Reading character strings containing an "umlaut" from a csv-file I find a (to 
> me) surprising behaviour in R 2.8.0, that I did not notice in R 2.7.2.
> A comparison by "==" results in FALSE, while grep does find the aggreement.
> See the example below.
> The crucial line is x=="div 1-2 Veränderungen", with the result [1] FALSE in 
> R 2.8.0 but
> [1] TRUE in R 2.7.2.
>
> Thank you in advance for your help
>
> Heinz Tüchler
>
> ##### in R 2.8.0 patched
>
> x0 <- "div 1-2 Veränderungen" # define a character string
>
> write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
> rm(x0)
>
> x <- read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in 
> csv-file
> x
> x=="div 1-2 Veränderungen"
>> [1] FALSE
> grep("div 1-2 Veränderungen", x)
>> [1] 1
> grep("div 1-2 Veränderungen", x, value=TRUE)
>> [1] "div 1-2 Veränderungen"
>
> unlink('chr.csv') # delete file
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status = Patched
> major = 2
> minor = 8.0
> year = 2008
> month = 11
> day = 04
> svn rev = 46830
> language = R
> version.string = R version 2.8.0 Patched (2008-11-04 r46830)
>
> Windows XP (build 2600) Service Pack 2
>
> Locale:
> LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices, 
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
>
> ##### in R 2.7.2 patched
>
>
> x0 <- "div 1-2 Veränderungen" # define a character string
>
> write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
> rm(x0)
>
> x <- read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in 
> csv-file
> x
> x=="div 1-2 Veränderungen"
>> [1] TRUE
> grep("div 1-2 Veränderungen", x)
>> [1] 1
> grep("div 1-2 Veränderungen", x, value=TRUE)
>> [1] "div 1-2 Veränderungen"
>
> unlink('chr.csv') # delete file
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status = Patched
> major = 2
> minor = 7.2
> year = 2008
> month = 09
> day = 02
> svn rev = 46486
> language = R
> version.string = R version 2.7.2 Patched (2008-09-02 r46486)
>
> Windows XP (build 2600) Service Pack 2
>
> Locale:
> LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices, 
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list