[R] please comment on my function

jim holtman jholtman at gmail.com
Sat Sep 15 02:06:33 CEST 2012


You can alway convert to lower case afterwards with probably a shorter
vector.  You did not indicate that you needed that conversion; it only
looked like you did it for the regular expression.

On Fri, Sep 14, 2012 at 3:13 PM, Sam Steingold <sds at gnu.org> wrote:
>> * jim holtman <wubygzna at tznvy.pbz> [2012-09-14 13:10:37 -0400]:
>>
>> more than half the time is in 'tolower' and 'nchar', so it is not all
>> 'sub's problem.
>
> aha, thanks!
>
>> This version runs a little faster since it does not need the 'tolower':
>>
>> canonicalize.language <- function (s) {
>>   # s <- tolower(s)
>>   long <- nchar(s) == 5
>>   s[long] <- sub("^([[:alpha:]]{2})[-_][[:alpha:]]{2}$","\\1",s[long])
>>   s[nchar(s) != 2 & s != "c"] <- "unknown"
>>   s
>> }
>
> but it does not convert "EN" to "en", so it is not good for my purposes.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com
> http://iris.org.il http://honestreporting.com http://memri.org
> Life is like Tetris: failures accumulate, successes fade.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.




More information about the R-help mailing list