[Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Thu Feb 7 13:55:53 CET 2019
This doesn't seem to be happening on MacOS, neither in Terminal nor RStudio, (R 3.5.1, R-devel, R-patched). So probably Windows specific.
-pd
> On 7 Feb 2019, at 11:17 , David Byrne <david.byrne222 using gmail.com> wrote:
>
> Bug
> Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
> file containing the infinity symbol (' ∞ ') results in the infinity
> symbol imported as the number 8. Other Unicode characters seem
> unaffected, example, Zhe: ж
>
> Expected Behavior:
> The imported data.frame should represent the infinity symbol as the
> expected 'Inf' so that normal mathematical operations can be processed
>
> Stack Overflow Post:
> I created a question on Stack Overflow where one other member was able
> to reproduce the same issues I was having. This question can be found
> at:
> https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int
>
> Method to Reproduce - 1:
> A simple method to reproduce this issues is to use R-Studio: In the
> console, type the following:
>> read.table(text=" ∞", encoding="UTF-8")
>
> The result should be a data.frame with a single value of '8'
>
> Repeating the same with ж Results in correct expected behavior
>
> Method to Reproduce - 2:
> Create a .csv file containing the infinity and Zhe characters (I have
> attached the file for convenience, hopefully it is no rejected by your
> email service). Launch an interactive session using
>
>> r --vanilla
>
> Enter the following statement taking care to replace the
> <path-to-file> with the appropriate one:
>
>> read.table("<path-to-file>/unicode_chars.csv", sep=",", encoding="UTF-8")
>
>
> This should result in a two element data.frame; the first being the
> incorrect value of 8 with an additional <U+FEFF> and the second the
> correct value of Zhe.
>
> Note the additional <U+FEFF> prefixed to the front of the '8'. This
> appears to be a hidden character for the purposes of letting editors
> know the encoding. The following link has some explanation however, it
> states this is caused by excel. The file I created was done so using
> notepad and not Excel.
>
> https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7
>
> System Details:
> OS:
>> Windows 10.0.17134 Build 17134
>
>
> R Version:
>> platform x86_64-w64-mingw32
>> arch x86_64
>> os mingw32
>> system x86_64, mingw32
>> status
>> major 3
>> minor 4.1
>> year 2017
>> month 06
>> day 30
>> svn rev 72865
>> language R
>> version.string R version 3.4.1 (2017-06-30)
>> nickname Single Candle
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-devel
mailing list