[R] on specifying an encoding for plot's main-argument
Daniel Bastos
dbastos at toledo.com
Mon Feb 1 20:56:01 CET 2016
Duncan Murdoch <murdoch.duncan at gmail.com> writes:
> On 29/01/2016 10:35 AM, Daniel Bastos wrote:
>> Here's how I plot a graph.
>>
>> plot(c(1,2,3), main = "graph ç")
>>
>> The main-string has a UTF-8 character "ç". I believe I'm using the
>> windows device. It opens up on my screen. (The window says ``R
>> Graphics: Device 2 (ACTIVE)''.) How can I tell it to use my encoding of
>> choice?
>
> As far as I know that's impossible. R uses the system encoding, and I
> don't think any Windows versions use UTF-8 code pages. They use
> UTF-16 for wide characters, and some 8 bit encoding for byte-sized
> characters. R will use whatever 8 bit code page Windows chooses.
You seem to be correct. Here's what Microsoft has to say. ``[...]
UTF-16 [...] is the most common encoding of Unicode and the one used for
native Unicode encoding on Windows operating systems.''[1]
They also claim that ``[w]hile Unicode-enabled functions in Windows use
UTF-16, it is also possible to work with data encoded in UTF-8 or UTF-7,
which are supported in Windows as multibyte character set code
pages.''[1]
But I couldn't verify the claim.
The documentation of setlocale[2] says the ``set of available locale
names, languages, country/region codes, and code pages includes all
those supported by the Windows NLS API except code pages that require
more than two bytes per character, such as UTF-7 and UTF-8. If you
provide a code page value of UTF-7 or UTF-8, setlocale will fail,
returning NULL.''[2]
That seems to be correct as per the following C code.
printf("locale: %s\n", setlocale(LC_ALL, "UTF-8"));
And [3] makes me think that _wsetlocale behaves the same way:
``_wsetlocale [...] is a wide-character version of setlocale; the
arguments and return values of _wsetlocale are wide-character strings.''
The following program seems to confirm it.
int main(int argc, char *argv[]) {
printf("locale: %s\n", _wsetlocale(LC_ALL, (const wchar_t *) "UTF-8"));
return 0;
}
[...]
(*) A workaround
Since R comes with iconv(), the following might be a safe way to
translate UTF-8 into the current system locale, displaying correctly
plot's titles on Windows systems.
iconv("utf8-string", from="UTF-8",
to=localeToCharset(Sys.getlocale("LC_CTYPE")))
(*) References
[1] MSDN Unicode
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374081(v=vs.85).aspx
[2] MSDN setlocale
https://msdn.microsoft.com/en-us/library/x99tb11d.aspx
[3] MSDN Locales and Code Pages
https://msdn.microsoft.com/en-us/library/8w60z792.aspx
More information about the R-help
mailing list