[R-pkg-devel] Warning... unable to translate 'Ekstr<f8>m' to a wide string; Error... input string 1 is invalid

Spencer Graves @pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Tue Jul 19 20:23:11 CEST 2022


Hi, Ivan et al.:


On 7/19/22 1:03 PM, Ivan Krylov wrote:
> On Tue, 19 Jul 2022 12:32:20 -0500
> Spencer Graves <spencer.graves using effectivedefense.org> wrote:
> 
>> Can someone provide me with a link to the correct development
>> version of help('iconv')?  The current version includes the exact
>> offending "\x" strings that I have.
> 
> http://svn.r-project.org/R/trunk/src/library/base/man/iconv.Rd
> 
> It still does, because it works with byte strings the right way: by
> passing them to iconv(), which is designed to work with bytes in
> "unknown" encodings.
> 
> In contrast, your use of arbitrary bytes with gsub() is invalid,
> because gsub() assumes that the strings match their declared encoding:
> UTF-8, Latin-1, or the native locale encoding. (See ?Encoding.)
> 
> When you write "Ekstr\xf8m", you get a string that consists of Latin-1
> bytes but has the wrong encoding property set. Given this string,
> gsub() and friends will break on a UTF-8 system (because "r\xf8m" is
> not a valid UTF-8 sequence of bytes), while iconv() will not.
> 
> Depending on the desired semantics of subNonStandardCharacters(), you
> might be able to avoid the failures with the useBytes argument, or you
> might silently return invalid data in some corner cases. Is the "x"
> argument supposed to be bytes in arbitrary encoding, or properly decoded
> characters that might include those that don't map to ASCII?
> 

	  Wow.  So what's the recommended fix?


	  If I understand correctly, "\u**" should work with ** being f8, f6, 
df, or fc [all hex digits, I assume?].  However, "\u00**" may be 
preferred over "\u**", and "\u{**}" may be better still.


	  The blog that Tomas wrote might be more useful if it included a 
recommendation like this.


	  Thanks for all your work to make R better and thereby help people 
everywhere extract better information from the data available to them.


	  Spencer



More information about the R-package-devel mailing list