[R-sig-teaching] Optimizing loop

Mark Sharp msharp at txbiomed.org
Fri Feb 6 23:46:53 CET 2015


Manel,

# I recommend the stringi package, which has deliberately copied much
#    of the syntax used by Hadley's wonderful stringr package.
# However, it does more and is much faster.
# Since you are not requiring anything complex you can use base R functionality
#    for everything, but I think the stringi syntax is cleaner and it is all
#    C++ code under the covers.

library(stringi)
seed(1)
len <- 10
# Let's make a dataframe with some data.
# The column name of 'NIF/NIE' is problematic.
# Although you can force it, R is not going to like it.
# Use an underscore or period.

df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', '1234numerals'),
                                      len, replace = TRUE),
                   col2 = 1:len, stringsAsFactors = FALSE)
df_2 # see what it looks like
df_2$NIF_NIE[!is.na(df_2$NIF_NIE) &
    stri_detect_regex(df_2$NIF_NIE, '^[0-9]')] <- 'SLOPD'
df_2

# end of code

On my iMac with len set to 10^6, this takes less than a tenth of a second.

> start.time <- Sys.time()
> len <- 1000000
> df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', '1234numerals'), len, replace = TRUE),
+     col2 = 1:len, stringsAsFactors = FALSE)
> end.time <- Sys.time()
> time.taken <- end.time - start.time
> time.taken
Time difference of 0.08968401 secs

R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org







> On Feb 5, 2015, at 2:08 AM, Manel Amado Martí <amado at cambrasabadell.org> wrote:
>
> I'm processing a table database. To do that, I put it in a dataframe, and then I do the data processing (normalization of some fields). I'm used to program in C, and some R's facilities are not so natural to me, please, excuse me if the question is for "dummies".
> In the processing, I want to substitute some field's value depending on the previous content. For example, if field starts with a digit instead of an alpha character, the entire field from the actual row, I'll replace it with "SOLPD". I'm sure that would be another way (maybe through some apply function), but I can't figure how to do.
> The code that I'm using now, is:
> for( i in 1:nrow(dataframe2)) {
>        if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
>                if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
>                        sprintf("elimina NIF aut�nom: % i\n",i)
>                        dataframe2[i,"NIF"]<-"SLOPD"}
>                }
>        }
> }
>
> Thank you for your attention!
>
>
>
>
> Manel Amado i Mart�
> Cap d'Assessoria de Comer� Interior
> amado at cambrasabadell.org<mailto:amado at cambrasabadell.org>
> Tel. 93 745 12 63 � Fax 93 745 12 64    [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/facebook.png] <https://www.facebook.com/cambrasabadell>   [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/Twitter.png] <https://twitter.com/CambraSabadell>   [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/LinkedIn.png] <http://www.linkedin.com/company/cambra-de-comer-de-sabadell?trk=company_name>
> Av. Francesc Maci�, 35 � 08206 Sabadell
> Apt. corr. 119 � www.cambrasabadell.org<http://www.cambrasabadell.org>
>
> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/peu.png]
> Aquest missatge pot contenir informaci� confidencial o sotmesa a secret professional, la divulgaci� de la qual est� prohibida per la llei. Si no sou el destinatari del missatge, si us plau, esborreu-lo i comuniqueu-nos-ho immediatament, no el reenvieu ni en copieu el contingut. Si la vostra empresa no permet rebre missatges d'aquesta mena, si us plau, feu-nos-ho saber immediatament.
> Este mensaje puede contener informaci�n confidencial o sometida a secreto profesional, cuya divulgaci�n est� prohibida por la ley. Si no es usted el destinatario del mensaje, le rogamos que lo borre y nos lo notifique inmediatamente; no lo reenv�e ni copie su contenido. Si su empresa no permite la recepci�n de mensajes de este tipo, por favor h�ganoslo saber inmediatamente.
> This message may contain confidential information that i...{{dropped:11}}
>
> <ATT00001.c>


NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.


More information about the R-sig-teaching mailing list