David Winsemius dwinsemius at comcast.net
Sun Sep 20 04:25:12 CEST 2015

```On Sep 19, 2015, at 2:09 PM, Ravi Teja wrote:

> Hi,
>
> I am trying to apply the below logic to generate flag_1 column on a data
> set consisting of ~1.2 million records in R.
>
> Code :
>
> for(i in 1: nrows)
>  {
>              if(A\$customer[i]==A\$customer[i+1])
>                {
>
>                  if(is.na(A\$Time_Diff[i]))
>                     A\$flag_1[i] <- 1
>                     else if (A\$Time_Diff[i] > 12)
>                     A\$flag_1[i] <- 1
>                     else
>                     A\$flag_1[i] <- A\$flag_1[i-1]+1
>
>               }
>
>            else
>            {
>
>              if(is.na(A\$Time_Diff[i]))
>                     A\$flag_1[i] <- 1
>                     else if (A\$Time_Diff[i] > 12)
>                     A\$flag_1[i] <- 1
>                     else
>                     A\$flag_1[i] <- A\$flag_1[i-1]+1
>
>               }
> }

The inner logic of the consequent and alternative appear identical.  Vectorized approaches would surely be faster. You should post some code that matches the data. In R customer is not the same as Customer, and Time_diff is not Time_Diff,  and my patience for this code review has expired.

Post the output from and do include code to create `nrows`:

>
> Resultant dataset should look like
>
> Customer   Time_diff    flag_1
> 1                   NA           1
> 1                   10             2
> 1                    8              3
> 1                    15            1
> 1                    9               2
> 1                    10              3
> 2                     NA            1
> 2                      2               2
> 2                      5               3
>
> The above logic will take approximately 60 hours to generate the flag_1
> column on a dataset consisting of ~1.2 million records. Is there any
> effective way in R to implement this logic in R ?
>
>
> Thanks,
> Ravi
>
AND R-help is a plain text only mailing list.
>
