[Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

David Byrne d@v|d@byrne222 @end|ng |rom gm@||@com
Thu Feb 7 14:33:01 CET 2019


I can confirm that it doesn't happen on Ubuntu 18.04.1 so Peter is
most likely correct; it looks like its Windows specific.

On Thu, 7 Feb 2019 at 12:55, peter dalgaard <pdalgd using gmail.com> wrote:
>
> This doesn't seem to be happening on MacOS, neither in Terminal nor RStudio, (R 3.5.1, R-devel, R-patched). So probably Windows specific.
>
> -pd
>
> > On 7 Feb 2019, at 11:17 , David Byrne <david.byrne222 using gmail.com> wrote:
> >
> > Bug
> > Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
> > file containing the infinity symbol (' ∞ ') results in the infinity
> > symbol imported as the number 8. Other Unicode characters seem
> > unaffected, example, Zhe: ж
> >
> > Expected Behavior:
> > The imported data.frame should represent the infinity symbol as the
> > expected 'Inf' so that normal mathematical operations can be processed
> >
> > Stack Overflow Post:
> > I created a question on Stack Overflow where one other member was able
> > to reproduce the same issues I was having. This question can be found
> > at:
> > https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int
> >
> > Method to Reproduce - 1:
> > A simple method to reproduce this issues is to use R-Studio: In the
> > console, type the following:
> >> read.table(text=" ∞", encoding="UTF-8")
> >
> > The result should be a data.frame with a single value of '8'
> >
> > Repeating the same with ж Results in correct expected behavior
> >
> > Method to Reproduce - 2:
> > Create a .csv file containing the infinity and Zhe characters (I have
> > attached the file for convenience, hopefully it is no rejected by your
> > email service). Launch an interactive session using
> >
> >> r --vanilla
> >
> > Enter the following statement taking care to replace the
> > <path-to-file> with the appropriate one:
> >
> >> read.table("<path-to-file>/unicode_chars.csv", sep=",", encoding="UTF-8")
> >
> >
> > This should result in a two element data.frame; the first being the
> > incorrect value of 8 with an additional <U+FEFF> and the second the
> > correct value of Zhe.
> >
> > Note the additional <U+FEFF> prefixed to the front of the '8'. This
> > appears to be a hidden character for the purposes of letting editors
> > know the encoding. The following link has some explanation however, it
> > states this is caused by excel. The file I created was done so using
> > notepad and not Excel.
> >
> > https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7
> >
> > System Details:
> > OS:
> >> Windows 10.0.17134 Build 17134
> >
> >
> > R Version:
> >> platform       x86_64-w64-mingw32
> >> arch           x86_64
> >> os             mingw32
> >> system         x86_64, mingw32
> >> status
> >> major          3
> >> minor          4.1
> >> year           2017
> >> month          06
> >> day            30
> >> svn rev        72865
> >> language       R
> >> version.string R version 3.4.1 (2017-06-30)
> >> nickname       Single Candle
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com
>
>
>
>
>
>
>
>
>



More information about the R-devel mailing list