[R] Purpose of readLines(..., encoding=)?
Milan Bouchet-Valat
nalimilan at club.fr
Sat Apr 5 12:54:14 CEST 2014
Hi!
I'm wondering what's the use of the 'encoding' argument to readLines(x),
as opposed to readLines(file(x, encoding=)). The same question applies
to read.table()'s 'encoding' vs 'fileEncoding' arguments. AFAIK only the
latter is able to re-encode the read text into the internal
representation used by R (let's say when reading files in encodings
other than latin1 and UTF-8). But then what's the purpose of the former?
?readLines says:
encoding: encoding to be assumed for input strings. It is used to mark
character strings as known to be in Latin-1 or UTF-8: it is
not used to re-encode the input. To do the latter, specify
the encoding as part of the connection ‘con’ or via
‘options(encoding=)’: see the example under ‘file’.
But if I have a UTF-8 text file to read, couldn't I use
readLines(file(x, encoding="UTF-8"))
instead of
readLines(x, encoding="UTF-8")
In my experience resulting character strings are marked as UTF-8 where
needed as well.
The reason I'm asking this is because I need to decide whether I should
allow users of a tm source plug-in to pass both (à la 'encoding' vs
'fileEncoding') or whether I could safely skip the first one.
Thanks for your help
More information about the R-help
mailing list