[Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Fri Feb 8 17:23:17 CET 2019


I can reproduce with read.table(encoding="UTF-8") in RGui on Windows 10, 
reading a file containing the two UTF-8 characters. The table is read 
correctly into R as documented (both characters are represented in UTF-8 
and marked as such), but, the conversion of Infinity to 8 and of Zhe to 
<U+0436> happens later during printing using print.data.frame(). For 
instance, it currently does not happen during print(as.matrix()). As I 
wrote in more detail in another email in this thread, R sometimes needs 
to convert strings to the current native encoding, Windows converts 
Infinity to 8 by default as "best fit", but fails to convert Zhe, so R 
displays the <U+436>.

It is easiest to only use input files in current native encoding, so one 
could convert before passing them to R and make sure the conversion does 
not have similar problems...  or use R on a non-Windows platform. 
Relying on which R functions/packages can work with non-native encodings 
may be brittle, but of course any R function that documents to work with 
non-native encodings (like read.table(encoding=)) should do so. If not, 
it will be fixed following a bug report.

I am not sure if that is what you had in mind, but conversion of 
character (string) to double is a different matter. as.double() now as 
documented in ?as.double returns NA for "∞" (on Linux).

Best
Tomas


On 2/7/19 11:17 AM, David Byrne wrote:
> Bug
> Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
> file containing the infinity symbol (' ∞ ') results in the infinity
> symbol imported as the number 8. Other Unicode characters seem
> unaffected, example, Zhe: ж
>
> Expected Behavior:
> The imported data.frame should represent the infinity symbol as the
> expected 'Inf' so that normal mathematical operations can be processed
>
> Stack Overflow Post:
> I created a question on Stack Overflow where one other member was able
> to reproduce the same issues I was having. This question can be found
> at:
> https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int
>
> Method to Reproduce - 1:
> A simple method to reproduce this issues is to use R-Studio: In the
> console, type the following:
>> read.table(text=" ∞", encoding="UTF-8")
> The result should be a data.frame with a single value of '8'
>
> Repeating the same with ж Results in correct expected behavior
>
> Method to Reproduce - 2:
> Create a .csv file containing the infinity and Zhe characters (I have
> attached the file for convenience, hopefully it is no rejected by your
> email service). Launch an interactive session using
>
>> r --vanilla
> Enter the following statement taking care to replace the
> <path-to-file> with the appropriate one:
>
>> read.table("<path-to-file>/unicode_chars.csv", sep=",", encoding="UTF-8")
>
> This should result in a two element data.frame; the first being the
> incorrect value of 8 with an additional <U+FEFF> and the second the
> correct value of Zhe.
>
> Note the additional <U+FEFF> prefixed to the front of the '8'. This
> appears to be a hidden character for the purposes of letting editors
> know the encoding. The following link has some explanation however, it
> states this is caused by excel. The file I created was done so using
> notepad and not Excel.
>
> https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7
>
> System Details:
> OS:
>> Windows 10.0.17134 Build 17134
>
> R Version:
>> platform       x86_64-w64-mingw32
>> arch           x86_64
>> os             mingw32
>> system         x86_64, mingw32
>> status
>> major          3
>> minor          4.1
>> year           2017
>> month          06
>> day            30
>> svn rev        72865
>> language       R
>> version.string R version 3.4.1 (2017-06-30)
>> nickname       Single Candle
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



	[[alternative HTML version deleted]]



More information about the R-devel mailing list