[Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8
David Byrne
d@v|d@byrne222 @end|ng |rom gm@||@com
Thu Feb 7 11:17:08 CET 2019
Bug
Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
file containing the infinity symbol (' ∞ ') results in the infinity
symbol imported as the number 8. Other Unicode characters seem
unaffected, example, Zhe: ж
Expected Behavior:
The imported data.frame should represent the infinity symbol as the
expected 'Inf' so that normal mathematical operations can be processed
Stack Overflow Post:
I created a question on Stack Overflow where one other member was able
to reproduce the same issues I was having. This question can be found
at:
https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int
Method to Reproduce - 1:
A simple method to reproduce this issues is to use R-Studio: In the
console, type the following:
> read.table(text=" ∞", encoding="UTF-8")
The result should be a data.frame with a single value of '8'
Repeating the same with ж Results in correct expected behavior
Method to Reproduce - 2:
Create a .csv file containing the infinity and Zhe characters (I have
attached the file for convenience, hopefully it is no rejected by your
email service). Launch an interactive session using
> r --vanilla
Enter the following statement taking care to replace the
<path-to-file> with the appropriate one:
> read.table("<path-to-file>/unicode_chars.csv", sep=",", encoding="UTF-8")
This should result in a two element data.frame; the first being the
incorrect value of 8 with an additional <U+FEFF> and the second the
correct value of Zhe.
Note the additional <U+FEFF> prefixed to the front of the '8'. This
appears to be a hidden character for the purposes of letting editors
know the encoding. The following link has some explanation however, it
states this is caused by excel. The file I created was done so using
notepad and not Excel.
https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7
System Details:
OS:
> Windows 10.0.17134 Build 17134
R Version:
> platform x86_64-w64-mingw32
> arch x86_64
> os mingw32
> system x86_64, mingw32
> status
> major 3
> minor 4.1
> year 2017
> month 06
> day 30
> svn rev 72865
> language R
> version.string R version 3.4.1 (2017-06-30)
> nickname Single Candle
More information about the R-devel
mailing list