[Rd] locales and readLines
Martin Morgan
mtmorgan at fhcrc.org
Fri Aug 31 18:30:43 CEST 2007
R-developers,
I'm looking for some 'best practices', or perhaps an upstream solution
(I have a deja vu about this, so sorry if it's already been asked).
Problems occur when a file is encoded as latin1, but the user has a
UTF-8 locale (or I guess more generally when the input locale does not
match R's). Here are two examples from the Bioconductor help list:
https://stat.ethz.ch/pipermail/bioconductor/2007-August/018947.html
(the relevant command is library(GEOquery); gse <- getGEO('GSE94'))
https://stat.ethz.ch/pipermail/bioconductor/2007-July/018204.html
I think solutions are:
* Specify the encoding in readLines.
* Convert the input using iconv.
* Tell the user to set their locale to match the input file (!)
Unfortunately, these (1 & 2, anyway) place extra burden on the package
author, to become educated about locales, the encoding conventions of
the files they read, and to know how R deals with encodings.
Are there other / better solutions? Any chance for some (additional)
'smarts' when reading files?
Martin
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org
More information about the R-devel
mailing list