[Rd] `basename` and `dirname` change the encoding to "UTF-8"
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Mon Jun 29 20:50:44 CEST 2020
On 29/06/2020 10:39 a.m., Johannes Rauh wrote:
> Dear R Developers,
>
> I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):
>
>> p <- "Föö/Bär"
>> Encoding(p)
> [1] "latin1"
>> Encoding(dirname(p))
> [1] "UTF-8"
>> Encoding(basename(p))
> [1] "UTF-8"
>
> Is this on purpose? At least I did not find any relevant comment in the documentation of `dirname`/`basename`.
>
> Background: I'm currently struggeling with a directory name containing a latin1-character. (I know that this is a bad idea, but I did not create the directory and I cannot rename it.) I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`. At that point, internally, `dirname` is called, which changes the encoding, and things break. If I use `debug` to halt the processing and "fix" the encoding, things work as expected.
>
> So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.
Actually, makeLazyLoadDB isn't exported from tools, so strictly speaking
you shouldn't be calling it. Or perhaps you have a good reason to call
it, and should be asking for it to be exported, or you are calling a
published function which calls it: in either case it should probably be
fixed to accept UTF-8.
But it doesn't call dirname or basename, so maybe the function that
calls it is the one that needs fixing.
In any case, while asking dirname() and basename() to preserve the
encoding sounds reasonable, it seems like it would just be covering up a
deeper problem.
Duncan Murdoch
More information about the R-devel
mailing list