[Rd] latin1,utf-8...encoding and data
Stéphane Dray
dray at biomserv.univ-lyon1.fr
Thu Oct 19 09:46:49 CEST 2006
Thanks a lot for this clear answer. So there is no way to preserve our
french cultural exception (accented characters), if we want to be
international... I have thought that the inclusion of a parameter
encoding in data function (e.g. data(mydata,encoding="latin1")) like in
the function 'file' could be an way to solve the problem. Apparently,
the problem is much more complicated...
Sincerely.
Prof Brian Ripley wrote:
> Only ASCII letters are portable: those accented characters do not even
> exist in many of the encodings used for R, e.g. Russian and Japanese
> on Windows machines.
>
> There is no way to associate an encoding with a character string in
> R. We considered it, but it would have had severe back-compatibility
> problems and little advantage (you cannot display non-ASCII character
> strings portably: even if you have a Unicode encoding you still need
> to select a suitable font).
>
> 'B. Ripley' (sic)
>
>
> On Wed, 18 Oct 2006, Stéphane Dray wrote:
>
>> Hello,
>> I have some questions concerning encoding and package distribution. We
>> develop the ade4 package. For some data sets included in the package,
>> there are accentued character (e.g. é,è...). The data sets have been
>> saved using latin1 encoding, but some of us use utf-8 and can not see
>> some data sets which contains accented chracters.
>> e.g:
>>
>> librarry(ade4)
>> data(rankrock)
>> rankrock
>>
>> in this case, characters are in rownames. Other data sets have such
>> characters in data (e.g. levels of factors..). A solution is to use
>> iconv... this is quite easy for us but perhaps more difficult for a user
>> which can have no idea of the problem. This problem is quite marginal
>> for the moment but some linux distribution are utf-8 by default (e.g.
>> ubuntu) and I suppose that the problem will be more and more present in
>> the future.
>>
>> So we wonder if there is a proper way to code and save these data sets.
>> I have found some documents of B. Ripley and this note :
>>
>> http://developer.r-project.org/210update.txt
>>
>> - Names in data objects (e.g. in .rda files) are problematic. It
>> is likely that by release time these will be treated as in
>> Latin-1.
>>
>> If I am correct, I did not find an answer to this problem.
>>
>> What are the plans of R gurus on this question ?
>> Thanks a lot.
>> Sincerely.
>>
>> Please add my adress in answers as I am not subsciber of this list.
>>
>>
>>
>
--
Stéphane DRAY (dray at biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88
http://biomserv.univ-lyon1.fr/~dray/
More information about the R-devel
mailing list