[Rd] Small encoding question

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Feb 15 04:53:31 CET 2008


Have you set R_ENCODING_LOCALES?  That's how you tell R what locale to use 
for latin1 and UTF-8 when checking.  Details in R-exts.texi.

As it works for me in 'C' on Leopard with R-devel without setting this, I 
can't reproduce the problem to check if setting works.

For l10n_info, it is asking the nl_langinfo system.  Looks like Darwin 
is using unusual charset names: it reports ISO8859-1 and we are 
looking for (the more correct) ISO-8859-1: I've 'hot fixed' that.


On Thu, 14 Feb 2008, Simon Urbanek wrote:

> I think I found the cause, but fixing it may be more complicated
> (other than a hot fix for this particular case).
>
> What it boils down to is that the code for .check_package_code_syntax
> is trying to change the locale in a manner that doesn't work. In
> addition to that, the output of l10n_info() is wrong (for some
> definition of wrong), which complicates things even further.
>
> To top it all, if run in a UTF-8 locale, everything is just fine -
> that's why the package will pass check on "regular" OS X, because
> UTF-8 locale is the default since Leopard.
>
> .check_package_code_syntax() sees that the source requires Latin1, so
> it is checking whether the locale is utf-8, but it's not (because we
> force C) so it uses en_US. This may be the first problem, because
> en_US is not necessarily a latin1 locale at all (en_US.ISO8859-1 would
> be latin1 on OS X). However, the next problem is that l10n_info() is
> returning FALSE even for the (correct) latin1 locale and
> consequently(?) the reading fails.
>
> ginaz:~$ echo 'Sys.getlocale(); l10n_info()'|LANG=en_US.ISO8859-1 R --
> vanilla --slave
> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/
> en_US.ISO8859-1"
> $MBCS
> [1] FALSE
>
> $`UTF-8`
> [1] FALSE
>
> $`Latin-1`
> [1] FALSE
>
> en_US.ISO8859-1 *is* a latin-1 locale ... I was looking hard and found
> no way how to link (installed) locales to encodings - there is no
> official mapping and POSIX allows arbitrary locales (and names) ..
> Hence all locale names are merely loose conventions... so I'm not sure
> how can R even make such a decision (other than parse the name?).
>
> Anyway - a quick fix would be to force en_US.UTF-8  locale in that
> check for Mac OS X, but I think that doesn't fix the underlying
> problems ...
>
> Cheers,
> Simon
>
>
> On Feb 14, 2008, at 3:09 PM, Simon Urbanek wrote:
>
>>
>> On Feb 14, 2008, at 2:45 PM, Kurt Hornik wrote:
>>
>>>>>>>> Vincent Goulet writes:
>>>
>>>> Dear developeRs,
>>>> Compilation of the latest version (0.9-5) of my actuar package fails
>>>> with r-release MacOS_X ix86 on CRAN; see
>>>
>>>> 	http://www.R-project.org/nosvn/R.check/r-release-macosx-ix86/actuar-00check.html
>>>
>>>> All errors come from accented letters in comments in latin-1 encoded
>>>> files (except hierarc.R which is in UTF-8, my bad). Encoding is
>>>> declared as latin-1 in DESCRIPTION.
>>>
>>>> The package checks and compiles fine on Windows, Linux and,
>>>> ironically, my MacOS X main development machine. I realize using
>>>> non-
>>>> ASCII characters in source files is not a good idea and I removed
>>>> them, but I would appreciate any clue as to what went wrong with the
>>>> compilation on CRAN.
>>>
>>> I assume that the MacOS X builds are done in a C locale?
>>>
>>
>> Yes - but isn't this very similar to the problem we have been talking
>> about a while back? The check analyses were reporting an error
>> although the code was fine (I think it boiled down to text connection
>> I/O in the check scripts failing mysteriously due to the fact that it
>> was using the wrong encoding) I'll have to check later today ...
>>
>> Cheers,
>> S
>>
>>
>>>
>>>> FWIW,
>>>
>>>>> sessionInfo()
>>>> R version 2.6.2 (2008-02-08)
>>>> i386-apple-darwin8.10.1
>>>
>>>> locale:
>>>> fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
>>>
>>>> attached base packages:
>>>> [1] stats     utils     datasets  grDevices graphics  methods   base
>>>
>>>> other attached packages:
>>>> [1] CarbonEL_0.1-4
>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.6.2
>>>
>>>> Thanks in advance!
>>>
>>>> ---
>>>>  Vincent Goulet, Associate Professor
>>>>  École d'actuariat
>>>>  Université Laval, Québec
>>>>  Vincent.Goulet at act.ulaval.ca   http://vgoulet.act.ulaval.ca
>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list