[R] How to optimizing the code for calculating the launch time
David McPearson
dmcp at webmail.co.za
Wed Aug 6 09:59:31 CEST 2014
On Tue, 5 Aug 2014 09:51:56 +0300 Lingyi Ma <lingyi.ma at gmail.com> wrote
> My dataset:
>
> Item_Id Year_Month
> B65623262 201204
> B58279745 201204
> B33671102 201204
> B36630946 201204
> B63270151 201204
> B63270133 201204
>
>
>
> I have written my code to calculate one more column which is the product
> maturity time as the following:
>
> launchtime<-function(g){
>
> su<-data.frame(NULL) #product maturity time
>
> g<-g[order(g$Item_Id,g$Year_Month),]
>
> #get the lauching time
> index2<-unique(g$Item_Id)
>
> for(u in 1:length(index2)){
> m2<-g[g$Item_Id==index2[u],]
>
> lt<-numeric(0)
> lt[1]<-0
> year<-as.numeric(substring(m2$Year_Month,1,4))
> month<-as.numeric(substring(m2$Year_Month,5,6))
> if(dim(m2)[1]==1){}else{
> for(i in 1:(dim(m2)[1]-1)){
> j<-i+1
> lt[j]<-(12*(year[j]-year[1])+month[j])-(month[1])
> }}
> g2<-cbind(m2,lt)
> su<-rbind(su,g2)
> }
>
> return(su)
>
> }
>
>
>
> How to optimize my code-. it takes so long time to run.
>
>
> Kind regards,
> Tammy
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Please read the posting guide - and the footer to just about every message on
the list (just above): Do not post html - it makes your messages hard to read.
Sample data and an example of the expected output will also go a long way to
help us understand your problem and help you.
Having writ that, welcome to Circles 1, 2, 3, 3.1, and 8.1.30 of The R Inferno
http://www.burns-stat.com/documents/books/the-r-inferno/
Get a copy. Read it. The time it takes will pay off ten-fold in coping with
common R mistakes. (And we should all thank Dr Burns for providing the
download at the best of all possible prices...) (Don't try and understand it
all at once - take note of the bits that make sense.)
To optimise your code:
* get rid of the inner loop (I don't think you need it)
* do not grow objects if possible, especially do not grow them inside a loop
* to get the results (I think) you want, put in the parentheses you (seem to
have) missed.
My version of your code below. Hopefully the comments will be of assistance.
#====================================================================
launchtime <- function(g) {
g <- cbind(g[order(g$Item_Id,g$Year_Month),], launch = NA]
# Inside a function you are working with a copy of the data.
# This will not change the original and will (should be) safe.
#get the lauching time
index2 <- unique(g$Item_Id)
next_row <- 1
# for(u in 1:length(index2)){
# m2<-g[g$Item_Id==index2[u],]
# for (u in seq_along(index2)) is better, but in this case I prefer
for (u in index2) {
m2 <- g[g$Item_Id == index2]
year<-as.numeric(substring(m2$Year_Month,1,4))
month<-as.numeric(substring(m2$Year_Month,5,6))
# hmmm - probably better to use "as.integer()" in this case.
# Also, if Year_Month is already numeric, the following might
# work better (or not...)
# year <- floor(m2$Year_Month / 100)
# month <- m2$Year_Month %% 100
# I don't think you need to loop...
lt <- 12 * (year - year[1]) + month - month[1]
# (Parentheses are mine - but I'm sure you need them)
g$launch[next_row:(next_row + length(lt) - 1)] <- lt
next_row <- next_row + length(lt)
# given that we've eliminated the inner loop, you could probably
# cbind() and rbind() OK - but this should be (at least a bit)
# faster
}
g
}
#
#====================================================================
Cheers.
____________________________________________________________
South Africas premier free email service - www.webmail.co.za
Cotlands - Shaping tomorrows Heroes http://www.cotlands.org.za/
More information about the R-help
mailing list