[R-SIG-Mac] Bug in reading UTF-16LE file?

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun Sep 8 11:23:36 CEST 2024


To R-SIG-Mac, with a copy to Jeff Newmiller:

On R-help there's a thread about reading a remote file that is coded in 
UTF-16LE with a byte-order mark.  Jeff Newmiller pointed out 
(https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that 
it would be better to declare the encoding as "UTF-16", because the BOM 
will indicate little endian.

I tried this on my Mac running R 4.4.1, and it didn't work.  I get the 
same incorrect result from all of these commands:

  # Automatically recognizing a URL and using fileEncoding:
  read.delim(
 
'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
      fileEncoding = "UTF-16"
  )

  # Using explicit url() with encoding:
  read.delim(
 
url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
         encoding = "UTF-16")
  )

  # Specifying the endianness incorrectly:
  read.delim(
 
url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
         encoding = "UTF-16BE")
  )

The only way I get the correct result is if I specify "UTF-16LE" 
explicitly, whereas Jeff got correct results on several different 
systems using "UTF-16".

Is this a MacOS bug or an R for MacOS bug?

Duncan Murdoch



More information about the R-SIG-Mac mailing list