[R] what is the effective method to apply the below logic for ~1.2 million records in R

David Winsemius dwinsemius at comcast.net
Sun Sep 20 04:25:12 CEST 2015


On Sep 19, 2015, at 2:09 PM, Ravi Teja wrote:

> Hi,
> 
> I am trying to apply the below logic to generate flag_1 column on a data
> set consisting of ~1.2 million records in R.
> 
> Code :
> 
> for(i in 1: nrows)
>  {
>              if(A$customer[i]==A$customer[i+1])
>                {
> 
>                  if(is.na(A$Time_Diff[i]))
>                     A$flag_1[i] <- 1
>                     else if (A$Time_Diff[i] > 12)
>                     A$flag_1[i] <- 1
>                     else
>                     A$flag_1[i] <- A$flag_1[i-1]+1
> 
>               }
> 
>            else
>            {
> 
>              if(is.na(A$Time_Diff[i]))
>                     A$flag_1[i] <- 1
>                     else if (A$Time_Diff[i] > 12)
>                     A$flag_1[i] <- 1
>                     else
>                     A$flag_1[i] <- A$flag_1[i-1]+1
> 
>               }
> }

The inner logic of the consequent and alternative appear identical.  Vectorized approaches would surely be faster. You should post some code that matches the data. In R customer is not the same as Customer, and Time_diff is not Time_Diff,  and my patience for this code review has expired.

Post the output from and do include code to create `nrows`:

 dput( head (A, 20) )


> 
> Resultant dataset should look like
> 
> Customer   Time_diff    flag_1
> 1                   NA           1
> 1                   10             2
> 1                    8              3
> 1                    15            1
> 1                    9               2
> 1                    10              3
> 2                     NA            1
> 2                      2               2
> 2                      5               3
> 
> The above logic will take approximately 60 hours to generate the flag_1
> column on a dataset consisting of ~1.2 million records. Is there any
> effective way in R to implement this logic in R ?
> 
> Appreciate your help.
> 
> Thanks,
> Ravi
> 
> 	[[alternative HTML version deleted]]

AND R-help is a plain text only mailing list.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list