[R-pkg-devel] Warning... unable to translate 'Ekstr<f8>m' to a wide string; Error... input string 1 is invalid

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Tue Jul 19 20:56:23 CEST 2022


On Tue, 19 Jul 2022 13:23:11 -0500
Spencer Graves <spencer.graves using effectivedefense.org> wrote:

> So what's the recommended fix?

Is subNonStandardCharacters() supposed to work with strings with
Encoding(.) == 'unknown' that are also invalid in current locale
encoding? (I think it's fair to not support Encoding(.) == 'bytes' for
such a function, because such strings aren't supposed to be text.)

If yes, the function itself needs to be fixed. I think that
useBytes=TRUE may help, as long as the standardCharacters argument is
limited to characters representable in ASCII. Alternatively, find a way
to transform the 'x' argument into something that is guaranteed to be
valid in its declared encoding. enc2utf8() could be an option, but any
invalid bytes are replaced by their <hexadecimal codes>, which defeats
the purpose of subNonStandardCharacters(). Find a way to feed the
output of Encoding(x) to iconv() as its "from" argument?

If not, it's enough to fix the example.

> 	  If I understand correctly, "\u**" should work with ** being
> f8, f6, df, or fc [all hex digits, I assume?].  However, "\u00**" may
> be preferred over "\u**", and "\u{**}" may be better still.

This is described in ?Quotes, although admittedly harder to find than
desired. The "\u" escape sequences take 1 to 4 hexadecimal digits. As
long as your escape sequence isn't followed by something that looks
like a hexadecimal digit, you can keep it short, like "\uf8m" (m is not
a hex digit). If you want to be 100% unambiguous, either padding the
code point number to 4 digits ("\u00f8m") or wrapping it into braces
("\u{f8}m") is enough. The belt-and-bracers approach ("\u{00f8}m") is
not an error, either.

You can also use the Encoding(x) <- 'latin1' trick to mark the strings
produced from bytes as Latin-1. Then gsub() will work normally, the
same way things happily work in example(iconv).

-- 
Best regards,
Ivan



More information about the R-package-devel mailing list