[R] Problem comparing two strings

Björn Fisseler bjoern@||@@e|er @end|ng |rom goog|em@||@com
Mon Nov 18 16:11:44 CET 2019


Hello,

I'm struggling comparing two strings, which come from different data 
sets. This strings are identical: "Alexander Jäger"

But when I compare these strings: string1 == string2
the result is FALSE.

Looking at the raw bytes used to encode the strings, the results are 
different:

string1: 41 6c 65 78 61 6e 64 65 72 20 4a c3 a4 67 65 72
string2: 41 6c 65 78 61 6e 64 65 72 20 4a 61 cc 88 67 65 72

string2 comes from the file names of different files on my machine 
(macOS), string1 comes from a data file (csv, UTF8 encoding).

It's obviously the umlaut "ä" in this example which is encoded with two 
respectively three bytes. The question is how to change this? This 
problem makes it impossible to join the two data sets based on the 
names. I already checked the settings on my machine: Sys.getlocale() 
returns "de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8". 
Changing/forcing the encoding of the data didn't bring the results I 
expected.

What else can I try?

Best regards

         Björn


	[[alternative HTML version deleted]]



More information about the R-help mailing list