[R] Text Encoding
David Winsemius
dwinsemius at comcast.net
Sat Apr 6 16:37:09 CEST 2013
On Apr 5, 2013, at 11:30 AM, Emily Ottensmeyer wrote:
> Dear R-Help,
>
> I am using the RDF package/ R 2.14 with the RDF package to download data
> from a website, and then use R to manipulate it.
>
> Text on the website is UTF-8. The RDF package's rdf_load command is
> converting it into a different encoding, which converts non-ASCII
> characters to unicode codes.
>
> On the webpage/sparql RDF: "4.5µg of cDNA was used"
>
> In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used"
>
> I can't seem to convert it back from \\u00B5 into "µ".
>
> I've tried iconv with various settings without success:
>> iconv(test, "latin1", "UTF-8")
> [1] "4.5\\u00B5g of cDNA was used"
>
> And, I tried Encoding, to see if I could figure that out, but it returns
> "unknown" on my string.
>> Encoding(test)
> [1] "unknown"
>
On my device entering this: "4.5\\u00B5g of cDNA was used"
... returns [1] "4.5\\u00B5g of cDNA was used"
But entering: "4.5\u00B5g of cDNA was used" returns:
[1] "4.5µg of cDNA was used"
> nchar("4.5\\u00B5g of cDNA was used")
[1] 27
> nchar("4.5\u00B5g of cDNA was used")
[1] 22
So the doubled "\" is really a single character in the first case and has no effect in escaping the next four hex digits but "\u00B5" in the second case is a correct "micro-character" (for my setup with my fonts)
If this is a systematic problem then you should contact the maintainer with a full problem description and a link to the website. If this is just a one-off problem just remove the extraneous backslash.
--
David.
> sessionInfo()
R version 3.0.0 RC (2013-03-31 r62463)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
<snipped>
> Anyone have any ideas on how to correct/convert the text encoding?
>
>
> Thanks!
> -Emily
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list