[R] what is the effective method to apply the below logic for ~1.2 million records in R
Ista Zahn
istazahn at gmail.com
Sun Sep 20 04:48:43 CEST 2015
This assumes that the data are sorted by customer, and that only the
first value of Time_Diff is missing for each customer (and that the
first value is always missing for each customer). If those assumptions
hold you can do something like
A <- read.table(text = "customer Time_Diff flag_1
1 NA 1
1 10 2
1 8 3
1 15 1
1 9 2
1 10 3
2 NA 1
2 2 2
2 5 3",
header = TRUE)
A$flag_1 <- NULL
library(data.table)
A <- as.data.table(A)
A[ , g15 := cumsum(c(0, ifelse(is.na(diff(Time_Diff > 12)), 0,
diff(Time_Diff > 12) > 0)))]
## I'm not proud of the previous line, probably there is a cleaner way
A[ , flag_1 := 1:.N, by = c("customer", "g15")]
A[ , g15 := NULL]
Best,
Ista
On Sat, Sep 19, 2015 at 5:09 PM, Ravi Teja <raviteja2504 at gmail.com> wrote:
> Hi,
>
> I am trying to apply the below logic to generate flag_1 column on a data
> set consisting of ~1.2 million records in R.
>
> Code :
>
> for(i in 1: nrows)
> {
> if(A$customer[i]==A$customer[i+1])
> {
>
> if(is.na(A$Time_Diff[i]))
> A$flag_1[i] <- 1
> else if (A$Time_Diff[i] > 12)
> A$flag_1[i] <- 1
> else
> A$flag_1[i] <- A$flag_1[i-1]+1
>
> }
>
> else
> {
>
> if(is.na(A$Time_Diff[i]))
> A$flag_1[i] <- 1
> else if (A$Time_Diff[i] > 12)
> A$flag_1[i] <- 1
> else
> A$flag_1[i] <- A$flag_1[i-1]+1
>
> }
> }
>
>
> Resultant dataset should look like
>
> Customer Time_diff flag_1
> 1 NA 1
> 1 10 2
> 1 8 3
> 1 15 1
> 1 9 2
> 1 10 3
> 2 NA 1
> 2 2 2
> 2 5 3
>
> The above logic will take approximately 60 hours to generate the flag_1
> column on a dataset consisting of ~1.2 million records. Is there any
> effective way in R to implement this logic in R ?
>
> Appreciate your help.
>
> Thanks,
> Ravi
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list