[Rd] Encoding API
Thomas Friedrichsmeier
thomas.friedrichsmeier at ruhr-uni-bochum.de
Thu Feb 15 21:10:41 CET 2007
Hi!
I've been observing the recent SVN log entries about encoding information in
CHARSXPs with great interest. This looks like a very nice addition. While
this is still work in progress, I'd like to suggest the following extra:
At least in RKWard, all shown strings need to be converted to UTF-8 (the
internal storage format used in Qt QStrings). This needs to be done
independent of the current locale, and the encoding used in the embedded R
process. I imagine other graphical or non-graphical toolkits will similarly
use UTF-8 to store strings, internally.
For this reason, an addition of e.g.
char* Rf_translateCharToUTF8(SEXP);
would be nice. This function would translate to UTF-8 independently of the
current LC_CTYPE. While it is possible to achieve the same effect by first
translating the strings to the current LC_CTYPE encoding (using
Rf_translateChar()), and then translate to UTF-8 in a second step (using
custom means, if needed), being able to do this conversion in a single step
would be more elegant, and also potentially avoid expensive recoding steps.
Alternatively, having access to the IS_UTF8 and IS_LATIN1 macros from C would
be good enough to hand-code efficient conversion to UTF-8 (but may be too
close to the internals).
Not sure, whether this is considered important enough to warant inclusion in
the API, but I just wanted to throw in the idea in time.
Regards
Thomas Friedrichsmeier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-devel/attachments/20070215/9f70fc9c/attachment.bin
More information about the R-devel
mailing list