[R] How to replace German umlauts in strings?

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Apr 10 21:06:22 CEST 2008


Or read the file with

read.table(file("umlaut.txt", encoding="MAC"), ...)


On Thu, 10 Apr 2008, Peter Dalgaard wrote:

> Dieter Menne wrote:
>> Hans-Jörg Bibiko <bibiko <at> eva.mpg.de> writes:
>>
>>
>>> On 10.04.2008, at 18:03, Hofert Marius wrote:
>>>
>>>> I have a file containing names of German students. These names
>>>> contain the characters "ä", "ö" or "ü" (German umlauts). I use
>>>> read.table() to read the file and let's assume the table is then
>>>> stored in a variable called "data". The names are then contained in
>>>> the first column, i.e. data[,1]. Now if I simply display the variable
>>>> "data", I see, that "ä" is replaced by \x8a, "ö" is replaced by \x9a
>>>> and so forth.
>>>>
>>
>> This is strange. When I have a file umlaut.txt
>>
>> Name
>> Äserich
>> Ömadel
>> Übermunsch
>>
>> and read it in with
>>
>> umlaut = read.table("umlaut.txt", header = TRUE)
>> umlautasis = read.table("umlaut.txt", header = TRUE,as.is = TRUE)
>>
>> I get the following in both cases:
>>
>>  umlautasis
>>         Name
>> 1    Äserich
>> 2     Ömadel
>> 3 Übermunsch
>>
>> This is on Windows Vista. I use it every day without ever having seen nasty
>> codings, typically with the following in latex
>>
>> \usepackage[T1]{fontenc}
>> \usepackage{textcomp}
>> \usepackage{babel}
>> \usepackage[latin1]{inputenc} % For ü,ä
>>
>>
>> Dieter
>>
> Thing is that \x9a for o-umlaut is an unusual encoding:
>
> > names(which(sapply(iconvlist(),iconv, x="S\x9aren")=="Sören"))
> [1] "CP1282"            "CSMACINTOSH"       "MAC"
> [4] "MAC-CENTRALEUROPE" "MACINTOSH"         "MACIS"
> [7] "MAC-IS"            "MAC-SAMI"
> > iconv("öäüÖÄÜ", to="MAC")
> [1] "\x9a\x8a\x9f\x85\x80\x86"
>
>
> and accordingly,
>
> > data$names <- iconv(data$names,from="MAC")
> > data
>  names points
> 1 Björn     10
> 2 Sören     20
>
> or, if you need to do it for many variables, this should work:
>
> ix <- sapply(data, is.character)
> data[ix] <- lapply(data[ix], iconv, from="MAC")


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list