[R] file reading /problems with encoding

T.Wunder at stud.uni-heidelberg.de T.Wunder at stud.uni-heidelberg.de
Mon Mar 1 15:45:04 CET 2010


I'm a little frightened because of a problem that occured lately as I  
tried to read in a xml-file (for replacing some variables in the  
string with values from a data frame). The almost biggest problem is  
the encoding of the xml-file. Since it is generated by Word 2007 its  
encoding is UTF-8 (as to see in the xml-header).
Now I'm establishing a file connection with
> channel <- file(filename,open="r+", encoding="UTF-8")
> ## filename = name of the file

For reading the whole file, I'm using the readLines()-function as follows
> t <- readLines(channel, n=-1,warn=F, encoding="UTF-8")

Eventually I'm merging the lines of this data frame with the following
> xml <- ""
> for(i in 1:length(t)) {
>    xml <- paste(xml,t[i],sep="")
> }

(is there a better way of doing this?)

However, when I execute those lines, I get a warning like:
"incorrect input in the input-connection"
When I read the output variable xml, it's kind of clear: The string  
stops at a combination of chinese or japanese characters (which  
normally shouldn't be a problem for UTF-8 encoding).

So that is the problem. How am I able to read in the whole xml-file as  
a string in R? I need to have the correct encoding, because I want to  
grep after special character like "ü".

Thank you for your help!

Kind regards, Tom

p.s. I'm not likely to use the XML-package, since I didn't want to  
parse the xml file :)

More information about the R-help mailing list