[R] Text Encoding
dwinsemius at comcast.net
Sat Apr 6 16:37:09 CEST 2013
On Apr 5, 2013, at 11:30 AM, Emily Ottensmeyer wrote:
> Dear R-Help,
> I am using the RDF package/ R 2.14 with the RDF package to download data
> from a website, and then use R to manipulate it.
> Text on the website is UTF-8. The RDF package's rdf_load command is
> converting it into a different encoding, which converts non-ASCII
> characters to unicode codes.
> On the webpage/sparql RDF: "4.5µg of cDNA was used"
> In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used"
> I can't seem to convert it back from \\u00B5 into "µ".
> I've tried iconv with various settings without success:
>> iconv(test, "latin1", "UTF-8")
>  "4.5\\u00B5g of cDNA was used"
> And, I tried Encoding, to see if I could figure that out, but it returns
> "unknown" on my string.
>  "unknown"
On my device entering this: "4.5\\u00B5g of cDNA was used"
... returns  "4.5\\u00B5g of cDNA was used"
But entering: "4.5\u00B5g of cDNA was used" returns:
 "4.5µg of cDNA was used"
> nchar("4.5\\u00B5g of cDNA was used")
> nchar("4.5\u00B5g of cDNA was used")
So the doubled "\" is really a single character in the first case and has no effect in escaping the next four hex digits but "\u00B5" in the second case is a correct "micro-character" (for my setup with my fonts)
If this is a systematic problem then you should contact the maintainer with a full problem description and a link to the website. If this is just a one-off problem just remove the extraneous backslash.
R version 3.0.0 RC (2013-03-31 r62463)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
> Anyone have any ideas on how to correct/convert the text encoding?
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Alameda, CA, USA
More information about the R-help