[Rd] Making iconv portable?
Milan Bouchet-Valat
nalimilan at club.fr
Mon Dec 15 23:10:14 CET 2014
Le lundi 15 décembre 2014 à 13:49 -0500, Simon Urbanek a écrit :
> On Dec 15, 2014, at 1:37 PM, Spencer Graves <spencer.graves at prodsyse.com> wrote:
> >
> >
> >> On Dec 15, 2014, at 10:13 AM, Simon Urbanek <simon.urbanek at r-project.org> wrote:
> >>
> >>>
> >>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik <Kurt.Hornik at wu.ac.at> wrote:
> >>>
> >>>>>>>> Spencer Graves writes:
> >>>
> >>>> Hello, All:
> >>>> What would it take to make “iconv” portable?
> >>>
> >>>
> >>>> I ask, because I want to convert accented characters to
> >>>> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
> >>>> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
> >>>> “", "ASCII//TRANSLIT”)’. This worked under Windows but failed
> >>>> on Linux and Mac. It’s part of the “subNonStandardCharacters”
> >>>> function in the Ecfun package. The development version on
> >>>> R-Forge uses this and returns “Raul” under Windows and NA
> >>>> under Mac OS X (and presumably also Linux).
> >>>
> >>> Hmm.
> >>>
> >>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
> >>> [1] "Raul"
> >>>
> >>> seems to work for me on Linux ...
> >>>
> >>
> >> also on OS X:
> >>
> >>> iconv("Raúl", "", "ASCII//TRANSLIT")
> >> [1] “Ra'ul"
> >
> >
> > Thanks for the replies. I should have checked my examples more carefully. Consider the following example and a slight modification from help(“iconv”):
> >
> >
> > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > > Encoding(x) <- "latin1"
> > > x
> > [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher"
> > > iconv(x, "latin1", "ASCII//TRANSLIT") # platform-dependent
> > [1] "Ekstrom" "J\"oreskog" "bisschen Z\"urcher"
> > >
> > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > > x
> > [1] "Ekstr\xf8m" "J\xf6reskog" "bi\xdfchen Z\xfcrcher"
> > > iconv(x, "", "ASCII//TRANSLIT") # platform-dependent
> > [1] NA NA NA
> >
> >
> > This suggests a two-step fix to my problem: (1) Check Encoding(x) and set to “latin1” if it’s “unknown”.
>
> Well, that depends heavily on your source. In the above it is hand-crafted latin1 so if you don't declare it, the native encoding will be assumed - which can be anything and has nothing to do with your actual input in this particular case where you hand-constructed latin1.
>
>
> > (2) Delete any new \” added by iconv.
> >
>
> The whole point of translit is to create combinations of ASCII
> characters that represent the unicode characters, so " is just one
> many characters that can be used.
But it's quite unexpected that ö is transliterated to "o and ú to 'u.
Looks like iconv on OS X has a different idea of what ASCII
transliteration means than on Linux and Windows...
Anyway it's easy to remove " and ' if needed.
Regards
> Cheers,
> S
>
>
> >
> > Thanks again,
> > Spencer
> >
> >>
> >>
> >>
> >>> -k
> >>>
> >>>
> >>>> The “iconv” R code merely calls compiled code, which I’ve used very little in 30 years.
> >>>
> >>>
> >>>> Thanks,
> >>>> Spencer
> >>>
> >>>
> >>>
> >>>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves <spencer.graves at structuremonitoring.com <mailto:spencer.graves at structuremonitoring.com>> wrote:
> >>>>>
> >>>>> Wonderful. Thanks very much. Spencer
> >>>>>
> >>>>>
> >>>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
> >>>
> >>>> [[alternative HTML version deleted]]
> >>>
> >>>> ______________________________________________
> >>>> R-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>> ______________________________________________
> >>> R-devel at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list