[R] Problem with ONE of the Special German Characters

Duncan Murdoch murdoch at stats.uwo.ca
Thu Apr 15 19:32:48 CEST 2010


On 15/04/2010 12:22 PM, Michael Stegh wrote:
> Dear List,
>
> I have data which contain the special German characters "ä", "ö", "ü" etc. After reading the
> text files into R those characters are displayed strangely, e. g. "ä" is  "ä". The first step is to
> replace those with their typical transcription, e. g. "ä" becomes "ae" by using the gsub
> command.
>   

Your example of  "ä" is what you would see if you stored it in UTF-8 
encoding, then read it in Latin1.  So I suspect you need to declare the 
encoding of the files you are reading before reading them.  You can do 
this as follows:

con <- file("foo.txt", encoding="UTF-8", open="r")
readLines(con)
close(con)

By default, R assumes the encoding of files matches the default encoding 
on your system. 
> Until I upgraded to version 2.10.1 (from 2.8.0) this worked perfectly for all characters. Now it
> works for all characters but "Ü".
>
> temp1<-gsub("Ãoe","Ue",temp1)
>   

You might want to try perl=TRUE in the gsub() call; it seems to handle 
strange characters in regular expressions better than the default TRE 
library does.

Duncan Murdoch

> This letter is displayed as "Ãoe" (as before), but R is no longer able to find this character. The
> problem seems to be linked to the "oe" part, since I could substitute for "Ã" without a problem.
> Strangely if I get the two characters by extracting them with the substr command to a variable
> and then using the variable I am able to substitute without a problem. Any ideas, what I am
> missing?
>
> Thanks,
>
> Michael
>
> 	[[alternative HTML version deleted]]
>
>   
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list