[R] Comparing Latin characters with and without accents?

Ista Zahn istazahn at gmail.com
Mon Dec 15 19:52:53 CET 2014


On Mon, Dec 15, 2014 at 12:33 AM, Spencer Graves
<spencer.graves at prodsyse.com> wrote:
> Hello, All:
>
>
>           What do people do to strip accents from latin characters, returning vanilla ASCII?

I find the stringi package works well for this sort of thing, e.g.,

library(stringi)
x <- c("!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",",
+ "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":",
+ ";", "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H",
+ "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V",
+ "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d",
+ "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
+ "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~", "-", " ",
+ "¡", "¢", "£", "¤", "¥", "¦", "§", "¨", "©", "ª", "«", "¬", "­", "®",
+ "¯", "°", "±", "²", "³", "´", "µ", "¶", "·", "¸", "¹", "º", "»", "¼",
+ "½", "¾", "¿", "À", "Á", "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É", "Ê",
+ "Ë", "Ì", "Í", "Î", "Ï", "Ð", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö", "×", "Ø",
+ "Ù", "Ú", "Û", "Ü", "Ý", "Þ", "ß", "à", "á", "â", "ã", "ä", "å", "æ",
+ "ç", "è", "é", "ê", "ë", "ì", "í", "î", "ï", "ð", "ñ", "ò", "ó", "ô",
+ "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ")
> cbind(x, stri_trans_general(x, "Latin-ASCII"))

Best,
Ista
>
>
>           For example, I want to convert ‘Raúl’ to “Raul”.  Milan (below) suggested 'iconv(x, “",  "ASCII//TRANSLIT”)’.  This worked under Windows but failed on Linux and Mac.  It’s part of the “subNonStandardCharacters” function in the Ecfun package.  The development version on R-Forge uses this and returns “Raul” under Windows and NA under Mac OS X (and something different from “Raul”, presumably NA, under Linux).
>
>
>           Thanks,
>           Spencer
>
>
>> On Nov 30, 2014, at 2:32 AM, Spencer Graves <spencer.graves at structuremonitoring.com> wrote:
>>
>> Wonderful.  Thanks very much.  Spencer
>>
>>
>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
>>> Le dimanche 30 novembre 2014 à 02:14 -0800, Spencer Graves a écrit :
>>>> Hello:
>>>>
>>>>
>>>>        How can one convert Latin characters with to the corresponding
>>>> characters without?  For example, I want to convert "ú" to "u", similar
>>>> to how tolower('U') returns "u".
>>>>
>>>>
>>>>        This can be done using chartr{base}, e.g., chartr('ú', 'u',
>>>> 'Raúl') returns "Raul".  However, I wondered if a simpler version of
>>>> this is available.
>>> This appears to work:
>>>> iconv("ù", "", "ASCII//TRANSLIT")
>>> [1] "u"
>>>
>>>
>>> Regards
>>>
>>>>        Thanks,
>>>>        Spencer
>>>>
>>>>
>>>> p.s.   findFn('convert to ascii') found 117 help pages in 70 packages.
>>>> A brief review identified two to "Convert to ASCII": ASCIIfy {gtools}
>>>> and stri_enc_toascii {stringi}.  Neither of these did what I expected.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list