[R] Reading a txt file from internet

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sat Sep 7 23:43:24 CEST 2024


On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote:
> When you specify LE in the encoding type, you are logically telling the decoder that you know the two-byte pairs are in little-endian order... which could override whatever the byte-order-mark was indicating. If the BOM indicated big-endian then the file decoding would break. If there is a BOM, don't override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless you really need it.

That sounds like good advice, but it doesn't work:

  > read.delim(
  +     'https://online.stat.psu.edu/onlinecourses/sites/stat501/files 
/ch15/employee.txt',
  +     fileEncoding = "UTF-16"
  + )
  [1] time 
 
 
 
 
 
 
 
 
 
 
 
 
 

  [2] 
vendor.洀攀琀愀氀........㐀㐀........㜀.㐀㐀........㤀.㐀㐀.㐀..㐀.....㐀..㐀..㔀...㜀.㐀..㠀..㘀...㠀.㐀㐀....㜀...㔀.㐀㐀.

and so on.
> 
> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann <es using enricoschumann.net> wrote:
>> On Sun, 08 Sep 2024, Christofer Bogaso writes:
>>
>>> Hi,
>>>
>>> I am trying to the data from
>>> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt
>>> without any success. Below is the error I am getting:
>>>
>>>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt')
>>>
>>> Error in make.names(col.names, unique = TRUE) :
>>>
>>>    invalid multibyte string at '<ff><fe>t'
>>>
>>> In addition: Warning messages:
>>>
>>> 1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
>>>
>>>    line 1 appears to contain embedded nulls
>>>
>>> 2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
>>>
>>>    line 2 appears to contain embedded nulls
>>>
>>> 3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
>>>
>>>    line 3 appears to contain embedded nulls
>>>
>>> 4: In read.table(file = file, header = header, sep = sep, quote = quote,  :
>>>
>>>    line 4 appears to contain embedded nulls
>>>
>>> 5: In read.table(file = file, header = header, sep = sep, quote = quote,  :
>>>
>>>    line 5 appears to contain embedded nulls
>>>
>>> Is there any way to read this data directly onto R?
>>>
>>> Thanks for your time
>>>
>>
>> The <ff><fe> looks like a byte-order mark
>> (https://en.wikipedia.org/wiki/Byte_order_mark).
>> Try this:
>>
>>     fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
>>                encoding = "UTF-16LE")
>>     read.delim(fn)
>>
>



More information about the R-help mailing list