[R] How to replace German umlauts in strings?

Hans-Jörg Bibiko bibiko at eva.mpg.de
Thu Apr 10 18:22:08 CEST 2008


On 10.04.2008, at 18:03, Hofert Marius wrote:
> I have a file containing names of German students. These names
> contain the characters "ä", "ö" or "ü" (German umlauts). I use
> read.table() to read the file and let's assume the table is then
> stored in a variable called "data". The names are then contained in
> the first column, i.e. data[,1]. Now if I simply display the variable
> "data", I see, that "ä" is replaced by \x8a, "ö" is replaced by \x9a
> and so forth. Now, I would like to have these characters replaced by
> their LaTeX (or TeX) equivalents, meaning \x8a should be replaced by
> \"a, \x9a should be replaced by \"o and so forth. I tried a lot,
> especially with gsub(), however, the backslashes cause problems and I
> do not know how to get this to work. The data.frame should then be
> written to a file without destroying the replaced substrings (so that
> indeed \"a appears in the file for \x8a). Is this possible?
>
> Here is a minimal example:
> data=data.frame(names=c("Bj\x9arn","S\x9aren"),points=c
> (10,20),stringsAsFactors=F)
> data[1,1]=gsub('\\x9a','\\"o',data[1,1]) #does not work! (neither do
> similar calls)

Try this:

gsub('\\x9a','\\"o',m, perl = TRUE, useBytes = TRUE)

Cheers,

--Hans


More information about the R-help mailing list