[R] Text Encoding

Milan Bouchet-Valat nalimilan at club.fr
Sat Apr 6 11:47:07 CEST 2013


Le vendredi 05 avril 2013 à 14:30 -0400, Emily Ottensmeyer a écrit :
> Dear R-Help,
> 
> I am using the RDF package/ R 2.14 with the RDF package to download data
> from a website, and then use R to manipulate it.
> 
> Text on the website is UTF-8.  The RDF package's rdf_load command is
> converting it into a different encoding, which converts non-ASCII
> characters to unicode codes.
> 
> On the webpage/sparql RDF: "4.5g of cDNA was used"
> 
> In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used"
> 
> I can't seem to convert it back from \\u00B5  into "".
Beware that \\u00B5 is the micro sign (greek letter mu), not "". This is
probably an important information...

> I've tried iconv with various settings without success:
> > iconv(test, "latin1", "UTF-8")
> [1] "4.5\\u00B5g of cDNA was used"
\\u00B5 looks like UTF-16, not UTF-8. Does this work?
iconv(test, "UTF-16", "UTF-8")

> And, I tried Encoding, to see if I could figure that out, but it returns
> "unknown" on my string.
> > Encoding(test)
> [1] "unknown"
> 
> 
> Anyone have any ideas on how to correct/convert the text encoding?
Can you provide us the file, or at least the required parts of it?

You can also try loading the file using xmlParse() from the XML package.


Regards



More information about the R-help mailing list