[Rd] latin1,utf-8...encoding and data
Stéphane Dray
dray at biomserv.univ-lyon1.fr
Wed Oct 18 17:02:25 CEST 2006
Hello,
I have some questions concerning encoding and package distribution. We
develop the ade4 package. For some data sets included in the package,
there are accentued character (e.g. é,è...). The data sets have been
saved using latin1 encoding, but some of us use utf-8 and can not see
some data sets which contains accented chracters.
e.g:
librarry(ade4)
data(rankrock)
rankrock
in this case, characters are in rownames. Other data sets have such
characters in data (e.g. levels of factors..). A solution is to use
iconv... this is quite easy for us but perhaps more difficult for a user
which can have no idea of the problem. This problem is quite marginal
for the moment but some linux distribution are utf-8 by default (e.g.
ubuntu) and I suppose that the problem will be more and more present in
the future.
So we wonder if there is a proper way to code and save these data sets.
I have found some documents of B. Ripley and this note :
http://developer.r-project.org/210update.txt
- Names in data objects (e.g. in .rda files) are problematic. It
is likely that by release time these will be treated as in
Latin-1.
If I am correct, I did not find an answer to this problem.
What are the plans of R gurus on this question ?
Thanks a lot.
Sincerely.
Please add my adress in answers as I am not subsciber of this list.
--
Stéphane DRAY (dray at biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88
http://biomserv.univ-lyon1.fr/~dray/
More information about the R-devel
mailing list