[Rd] Making iconv portable?
Simon Urbanek
simon.urbanek at r-project.org
Mon Dec 15 19:49:16 CET 2014
On Dec 15, 2014, at 1:37 PM, Spencer Graves <spencer.graves at prodsyse.com> wrote:
>
>
>> On Dec 15, 2014, at 10:13 AM, Simon Urbanek <simon.urbanek at r-project.org> wrote:
>>
>>>
>>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik <Kurt.Hornik at wu.ac.at> wrote:
>>>
>>>>>>>> Spencer Graves writes:
>>>
>>>> Hello, All:
>>>> What would it take to make “iconv” portable?
>>>
>>>
>>>> I ask, because I want to convert accented characters to
>>>> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
>>>> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
>>>> “", "ASCII//TRANSLIT”)’. This worked under Windows but failed
>>>> on Linux and Mac. It’s part of the “subNonStandardCharacters”
>>>> function in the Ecfun package. The development version on
>>>> R-Forge uses this and returns “Raul” under Windows and NA
>>>> under Mac OS X (and presumably also Linux).
>>>
>>> Hmm.
>>>
>>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
>>> [1] "Raul"
>>>
>>> seems to work for me on Linux ...
>>>
>>
>> also on OS X:
>>
>>> iconv("Raúl", "", "ASCII//TRANSLIT")
>> [1] “Ra'ul"
>
>
> Thanks for the replies. I should have checked my examples more carefully. Consider the following example and a slight modification from help(“iconv”):
>
>
> > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > Encoding(x) <- "latin1"
> > x
> [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher"
> > iconv(x, "latin1", "ASCII//TRANSLIT") # platform-dependent
> [1] "Ekstrom" "J\"oreskog" "bisschen Z\"urcher"
> >
> > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > x
> [1] "Ekstr\xf8m" "J\xf6reskog" "bi\xdfchen Z\xfcrcher"
> > iconv(x, "", "ASCII//TRANSLIT") # platform-dependent
> [1] NA NA NA
>
>
> This suggests a two-step fix to my problem: (1) Check Encoding(x) and set to “latin1” if it’s “unknown”.
Well, that depends heavily on your source. In the above it is hand-crafted latin1 so if you don't declare it, the native encoding will be assumed - which can be anything and has nothing to do with your actual input in this particular case where you hand-constructed latin1.
> (2) Delete any new \” added by iconv.
>
The whole point of translit is to create combinations of ASCII characters that represent the unicode characters, so " is just one many characters that can be used.
Cheers,
S
>
> Thanks again,
> Spencer
>
>>
>>
>>
>>> -k
>>>
>>>
>>>> The “iconv” R code merely calls compiled code, which I’ve used very little in 30 years.
>>>
>>>
>>>> Thanks,
>>>> Spencer
>>>
>>>
>>>
>>>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves <spencer.graves at structuremonitoring.com <mailto:spencer.graves at structuremonitoring.com>> wrote:
>>>>>
>>>>> Wonderful. Thanks very much. Spencer
>>>>>
>>>>>
>>>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
>>>
>>>> [[alternative HTML version deleted]]
>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list