[R] Unicode normalization?
Allan Engelhardt
allane at cybaea.com
Wed Jun 17 17:35:43 CEST 2009
Does R support unicode normalization? For my application, I'd quite
like to test for canonical equivalence (e.g. "n\u0303" is equivalent to
"\u00F1" which is ñ) and ideally convert strings to NFD form. ("\u0303"
is the "combining tilde" character.) Is there a package for this?
The Unicode Normalization FAQ [1] states that "Programs should always
compare canonical-equivalent Unicode strings as equal" so is it even a
bug that "n\u0303" != "\u00F1" in my version of R?
Allan
[1] see http://www.unicode.org/unicode/faq/normalization.html
More information about the R-help
mailing list