[Rd] latin1,utf-8...encoding and data

Stéphane Dray dray at biomserv.univ-lyon1.fr
Wed Oct 18 17:02:25 CEST 2006


Hello,
I have some questions concerning encoding and package distribution. We 
develop the ade4 package. For some data sets included in the package, 
there are accentued character (e.g. é,è...). The data sets have been 
saved using latin1 encoding, but some of us use utf-8 and can not see 
some data sets which contains accented chracters.
e.g:

librarry(ade4)
data(rankrock)
rankrock

in this case, characters are in rownames. Other data sets have such 
characters in data (e.g. levels of factors..). A solution is to use 
iconv... this is quite easy for us but perhaps more difficult for a user 
which can have no idea of the problem. This problem is quite marginal 
for the moment but some linux distribution are utf-8 by default (e.g. 
ubuntu) and I suppose that the problem will be more and more present in 
the future.

So we wonder if there is a proper way to code and save these data sets. 
I have found some documents of B. Ripley and this note :

http://developer.r-project.org/210update.txt

  -  Names in data objects (e.g. in .rda files) are problematic.  It
     is likely that by release time these will be treated as in
     Latin-1.

If I am correct, I did not find an answer to this problem.

What are the plans of R gurus on this question ?
Thanks a lot.
Sincerely.

Please add my adress in answers as I am not subsciber of this list.


-- 
Stéphane DRAY (dray at biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57       Fax: 33 4 72 43 13 88
http://biomserv.univ-lyon1.fr/~dray/




More information about the R-devel mailing list