[R] Problem with R coding
Ivan Krylov
|kry|ov @end|ng |rom d|@root@org
Tue Mar 12 17:08:42 CET 2024
В Tue, 12 Mar 2024 14:57:28 +0000
CALUM POLWART <polc1410 using gmail.com> пишет:
> That's almost certainly going to be either the utf-8 character in the
> path
The problem, as diagnosed by Maria in the first post of the thread, is
that the user home directory as known to R is stored in the ANSI
encoding instead of UTF-8, despite the session charset should be UTF-8
as Maria's Windows is sufficiently new [*]. As soon as any bit of R
tries to perform an encoding conversion using this path (for example,
file.exists() converting it from UTF-8 to UCS-2 in order to interact
with Windows filesystem APIs), the conversion fails. Since tcltk2 uses
file.exists('~/...') in its .onLoad, the package and any of its hard
dependencies are now broken.
Normally, this path is determined automatically to be something like
C:\Users\username\Documents. With OneDrive taking over, it turns out to
be something else and for some reason in the wrong encoding (ANSI but
marked as native == UTF-8).
The function that determines this path lives in src/gnuwin32/shext.c
(char *getRUser(void)). It starts by looking at the environment
variable R_USER, which is why in order to override OneDrive, the user
has to set it first (in the command line or system settings). If that
fails, R tries the environment variable HOME (which is usually set on
Windows, isn't it?), consults SHGetKnownFolderPath(FOLDERID_Documents)
(which returns the result as a wchar_t[] to be manually converted to
UTF-8 by R), consults a few more environment variables, and finally
tries to use the current directory. There is likely no easy way to use
`subst` to give a different home drive to R.
If I set %HOME% or even %R_USER% to a non-ASCII path without setting up
OneDrive, R works normally, so getenv() must be able to return
UTF-8-encoded variables. I don't see how ShellGetPersonalDirectory()
could fail in this manner. My remaining hypothesis is that OneDrive
somehow causes getenv("HOME") to return "C:\\Users\\marga\\OneDrive
- Fundación Universitaria San Pablo CEU\\Documentos" in ANSI instead of
UTF-8.
If anyone here has OneDrive set up in this manner and can debug R,
a trace of what getRUser() actually does would be very useful.
--
Best regards,
Ivan
[*]
https://blog.r-project.org/2020/05/02/utf-8-support-on-windows/index.html
https://blog.r-project.org/2022/11/07/issues-while-switching-r-to-utf-8-and-ucrt-on-windows/index.html
More information about the R-help
mailing list