[R-SIG-Mac] Bug in reading UTF-16LE file?
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sun Sep 8 11:23:36 CEST 2024
To R-SIG-Mac, with a copy to Jeff Newmiller:
On R-help there's a thread about reading a remote file that is coded in
UTF-16LE with a byte-order mark. Jeff Newmiller pointed out
(https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that
it would be better to declare the encoding as "UTF-16", because the BOM
will indicate little endian.
I tried this on my Mac running R 4.4.1, and it didn't work. I get the
same incorrect result from all of these commands:
# Automatically recognizing a URL and using fileEncoding:
read.delim(
'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
fileEncoding = "UTF-16"
)
# Using explicit url() with encoding:
read.delim(
url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
encoding = "UTF-16")
)
# Specifying the endianness incorrectly:
read.delim(
url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
encoding = "UTF-16BE")
)
The only way I get the correct result is if I specify "UTF-16LE"
explicitly, whereas Jeff got correct results on several different
systems using "UTF-16".
Is this a MacOS bug or an R for MacOS bug?
Duncan Murdoch
More information about the R-SIG-Mac
mailing list