[R] Problem related to multibyte string in CSV file

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Thu Nov 14 18:49:36 CET 2019


On Thu, 14 Nov 2019 09:34:30 -0800
Dennis Fisher <fisher using plessthan.com> wrote:

> 	Warning message: 
> 	In readLines(FILE, n = 1) : line 1 appears to contain an
> embedded nul 

<...>

> 	print(STRING) 
> 	[1] "\xff\xfet”

Most probably, this means that the FILE is UCS-2LE-encoded (or maybe
UTF-16). Unlike UTF-8, text encoded using UCS-2LE may contain NUL bytes
if the code points in question are U+00FF and below. You should decode
it before processing it in R; one of the examples in ?readLines shows
how to do it:

# read a 'Windows Unicode' file
A <- readLines(con <- file("Unicode.txt", encoding = "UCS-2LE"))
close(con)
 
> Now to my question:  I am trying to automate this process and I would
> like to see the output from the print command but without the [1]
> that precedes the string.

Try encodeString combined with cat or message.

-- 
Best regards,
Ivan



More information about the R-help mailing list