[R] remove accents in strings

David Winsemius dwinsemius at comcast.net
Tue Sep 7 20:01:58 CEST 2010


On Sep 7, 2010, at 1:35 PM, Matt Shotwell wrote:

> If you know the encoding of the string, or if its encoding is the
> current locale encoding, then you can use the iconv function to  
> convert
> the string to ASCII. Something like:
>
> iconv(accented.string, to="ASCII//TRANSLIT")
>
> While 7-bit ASCII does not permit accented characters, extended (8- 
> bit)
> ASCII does. Hence, I'm not sure this will work. But it's worth a try.

 > tst <- c("à", "è", "ì", "ò", "ù" , "À", "È", "Ì", "Ò", "Ù", "á",  
"é", "í", "ó", "ú", "ý" , "Á", "É", "Í", "Ó", "Ú", "Ý")
 > iconv(tst, to="ASCII//TRANSLIT")
  [1] "`a" "`e" "`i" "`o" "`u" "`A" "`E" "`I" "`O" "`U" "'a" "'e" "'i"  
"'o" "'u" "'y"
[17] "'A" "'E" "'I" "'O" "'U" "'Y"
 > gsub("`|\\'", "", iconv(tst, to="ASCII//TRANSLIT"))
  [1] "a" "e" "i" "o" "u" "A" "E" "I" "O" "U" "a" "e" "i" "o" "u" "y"  
"A" "E" "I" "O"
[21] "U" "Y"

Notice that the accent acute gets converted to a single quote and  
therefore needs to be dbl-\-ed to get recognized in an R regex pattern.

On a Mac with: locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

-- 
David.
>
> -Matt
>
> On Tue, 2010-09-07 at 13:04 -0400, lamack lamack wrote:
>> Dear all, there is a R function to remove all accents in strings?
>>
>> best regards.
>>
>> JL
>>
>>
>> 		 	   		
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Matthew S. Shotwell
> Graduate Student
> Division of Biostatistics and Epidemiology
> Medical University of South Carolina
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list