[R] Purpose of readLines(..., encoding=)?

Milan Bouchet-Valat nalimilan at club.fr
Sat Apr 5 12:54:14 CEST 2014


I'm wondering what's the use of the 'encoding' argument to readLines(x),
as opposed to readLines(file(x, encoding=)). The same question applies
to read.table()'s 'encoding' vs 'fileEncoding' arguments. AFAIK only the
latter is able to re-encode the read text into the internal
representation used by R (let's say when reading files in encodings
other than latin1 and UTF-8). But then what's the purpose of the former?

?readLines says:
encoding: encoding to be assumed for input strings.  It is used to mark
          character strings as known to be in Latin-1 or UTF-8: it is
          not used to re-encode the input.  To do the latter, specify
          the encoding as part of the connection ‘con’ or via
          ‘options(encoding=)’: see the example under ‘file’.

But if I have a UTF-8 text file to read, couldn't I use
readLines(file(x, encoding="UTF-8"))
instead of
readLines(x, encoding="UTF-8")

In my experience resulting character strings are marked as UTF-8 where
needed as well.

The reason I'm asking this is because I need to decide whether I should
allow users of a tm source plug-in to pass both (à la 'encoding' vs
'fileEncoding') or whether I could safely skip the first one.

Thanks for your help

More information about the R-help mailing list