[R-pkg-devel] handling of byte-order-mark on r-devel-linux-x86_64-debian-clang machine
Ivan Krylov
kry|ov@r00t @end|ng |rom gm@||@com
Sat Mar 26 14:58:42 CET 2022
On Sat, 26 Mar 2022 11:34:00 +0000
Daniel Kelley <Dan.Kelley using Dal.Ca> wrote:
> This file starts with a byte-order-mark, and this is skipped over on
> all but the r-devel-linux-x86_64-debian-clang machine
Could you please explain how you came to this conclusion? I don't have
much experience with testthat, but looking at the recent results at
<https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/oce-00check.html>,
it seems that the whole header is mis-decoded as latin-1, not just the
BOM included in one field name.
Please correct me if my understanding is wrong, but I'm seeing you use
readLines to read the header of the file:
>> # encoding defaults to "UTF-8-BOM"
>> text <- readLines(file, 1, encoding=encoding, warn=FALSE)
The `encoding` argument of readLines() is documented as follows:
>> encoding: encoding to be assumed for input strings. It is used to
>> mark character strings as known to be in Latin-1 or UTF-8: it is
>> not used to re-encode the input. To do the latter, specify the
>> encoding as part of the connection ‘con’ or via
>> ‘options(encoding=)’: see the examples.
It's unfortunate that you lack a clean way of reproducing the problem
(shouldn't it consistently fail on all glibc/libiconv/??? versions?),
but I think that the right thing to do here is to use
readLines(file(file, encoding = encoding), ...)
...and not the `encoding` argument of readLines. (See also: somewhat
confusing "Encoding" section in ?file.)
Taking another look at the check log, I see:
>> using session charset: ISO8859-15
Since readLines() seems to return text with Encoding(.) ==
'unknown' (i.e. native encoding) when it doesn't recognise its
`encoding` argument, I guess what happens here is that the UTF-8 text is
interpreted as ISO8859-15, and the same thing used to happen on
Windows, where the native encoding is the current ANSI code page. This
gives me a reason to hope that the test will start passing on Windows
too once you apply the fix.
--
Best regards,
Ivan
More information about the R-package-devel
mailing list