[R] file reading /problems with encoding

T.Wunder at stud.uni-heidelberg.de T.Wunder at stud.uni-heidelberg.de
Tue Mar 2 09:32:02 CET 2010


Quoting Uwe Ligges <ligges at statistik.tu-dortmund.de>:

> R is not able to re-encode the file to the native encoding. But if you
> keep it in UTF-8, what is the problem to grep for the specific
> characters (as grep and friends support the argument useBytes these
> days)?


The Problem with UTF-8 is that I'm not able to cat a valid xml-file.
Using the encoding="UTF-8" option in either the file() or the  
readLines() command will cause an error. If I would leave out both,  
it's not possible for me to run a gsub command on the string, because  
of special characters - even with the useBytes-option turned on:
grep("über 40%",xml,useBytes=TRUE)
will return integer(0). And the problem is obvious:
By reading in the file, the "ü" was taken to "üb".
However I believe, that I did not use the useBytes-option in the right  
way, didn't I?

Thanks a lot for your help!

Best regards, Tom



More information about the R-help mailing list