[R] how to get rid of 2 for-loops and optimize runtime
William Dunlap
wdunlap at tibco.com
Tue Oct 20 18:13:26 CEST 2009
> -----Original Message-----
> From: Ian Willems [mailto:ian.willems at uz.kuleuven.ac.be]
> Sent: Tuesday, October 20, 2009 6:46 AM
> To: William Dunlap; r-help at r-project.org
> Subject: RE: [R] how to get rid of 2 for-loops and optimize runtime
>
> Hi William,
>
> Your programs works perfect and very fast for the table I'm
> using right now (only one match per row)
> If I want to reuse this code other tables, it can match with
> more than one row.
> Is it possible to adapt your code easily, if I have to sum
> the values of last month from different rows?
You can use aggregate() with one of those
keys to sum up the values with a common key
value.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> Thank u for your help
> regards,
>
> Ian
>
>
> -----Oorspronkelijk bericht-----
> Van: William Dunlap [mailto:wdunlap at tibco.com]
> Verzonden: maandag 19 oktober 2009 18:08
> Aan: Ian Willems; r-help at r-project.org
> Onderwerp: RE: [R] how to get rid of 2 for-loops and optimize runtime
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Ian Willems
> > Sent: Monday, October 19, 2009 6:50 AM
> > To: 'r-help at r-project.org'
> > Subject: [R] how to get rid of 2 for-loops and optimize runtime
> >
> > Short: get rid of the loops I use and optimize runtime
> >
> > Dear all,
> >
> > I want to calculate for each row the amount of the month ago.
> > I use a matrix with 2100 rows and 22 colums (which is still a
> > very small matrix. nrows of other matrixes can easily be more
> > then 100000)
> >
> > Table before
> > Year month quarter yearmonth Service ... Amount
> > 2009 9 Q3 092009 A
> > ... 120
> > 2009 9 Q3 092009 B
> > ... 80
> > 2009 8 Q3 082009 A
> > ... 40
> > 2009 7 Q3 072009 A
> > ... 50
> >
> > The result I want
> > Year month quarter yearmonth Service ... Amount
> amound_lastmonth
> > 2009 9 Q3 092009 A ... 120 40
> > 2009 9 Q3 092009 B ... 80 ...
> > 2009 8 Q3 082009 A ... 40 50
> > 2009 7 Q3 072009 A ... 50 ...
> >
> > Table is not exactly the same but gives a good idea what I
> > have and what I want
> >
> > The code I have written (see below) does what I want but it
> > is very very slow. It takes 129s for 400 rows. And the time
> > gets four times higher each time I double the amount of rows.
> > I'm new in programming in R, but I found that you can use
> > Rprof and summaryRprof to analyse your code (output see below)
> > But I don't really understand the output
> > I guess I need code that requires linear time and need to get
> > rid of the 2 for loops.
> > can someone help me or tell me what else I can do to optimize
> > my runtime
> >
> > I use R 2.9.2
> > windows Xp service pack3
> >
> > Thank you in advance
> >
> > Best regards,
> >
> > Willems Ian
> >
> >
> > *****************************
> > dataset[,5]= month
> > dataset[,3]= year
> > dataset[,22]= amount
> > dataset[,14]= servicetype
> >
> > [CODE]
> > #for each row of the matrix check if each row has..
> > > for (j in 1:Number_rows) {
> > + sum<-0
> > + for(i in 1:Number_rows){
> > + if (dataset[j,14]== dataset[i,14]) #..the same service type
> > + {if (dataset[j,18]== dataset[i,18]) # .. the same department
> > + {if (dataset[j,5]== "1") # if month=1, month ago is
> > 12 and year is -1
> > + {if ("12"== dataset[i,5])
> > + {if ((dataset[j,3]-1)== dataset[i,3])
> > +
> > + { sum<-sum + dataset[i,22]}
> > + }}
> > + else {
> > + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1,
> > month ago is month -1
> > + { if (dataset[j,3]== dataset[i,3])
> > + {sum<-sum + dataset[i,22]}
> > + }}}}}}
>
> match() is often useful for quickly finding the locations of
> many items in a vector. It has no special methods for data.frames
> so you must combine the columns of interest into a character
> vector of keys and use match on the key vectors. E.g.
>
> # your test data in a format that mail readers
> # can copy and paste into R:
> d <- read.table(header=TRUE, textConnection("
> Year month quarter yearmonth Service Amount
> 2009 9 Q3 092009 A 120
> 2009 9 Q3 092009 B 80
> 2009 8 Q3 082009 A 40
> 2009 7 Q3 072009 A 50
> "))
> # The key functions
> dKey <- function(d) {
> with(d, paste(d$Year, d$month, d$Service, sep=";"))
> }
> keyThisMonth <- function(d)dKey(d)
> keyPrevMonth <- function(d) {
> stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service))
> isJan <- d$month==1
> d$Year[isJan] <- d$Year[isJan] - 1
> d$month[isJan] <- 12
> d$month[!isJan] <- d$month[!isJan] - 1
> dKey(d)
> }
> # Make the new column:
> d$AmountPrevMonth <- d$Amount[ match(keyPrevMonth(d),
> keyThisMonth(d)) ]
> # The result
> print(d)
>
> Year month quarter yearmonth Service Amount AmountPrevMonth
> 1 2009 9 Q3 92009 A 120 40
> 2 2009 9 Q3 92009 B 80 NA
> 3 2009 8 Q3 82009 A 40 50
> 4 2009 7 Q3 72009 A 50 NA
>
> This assumes there is only one match per row. Is this the
> result you are looking for?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> >
> > [\Code]
> >
> > > summaryRprof()
> > $by.self
> > self.time self.pct total.time total.pct
> > [.data.frame 33.92 26.2 80.90 62.5
> > NextMethod 12.68 9.8 12.68 9.8
> > [.factor 8.60 6.6 18.36 14.2
> > Ops.factor 8.10 6.3 40.08 31.0
> > sort.int 6.82 5.3 13.70 10.6
> > [ 6.70 5.2 85.44 66.0
> > names 6.54 5.1 6.54 5.1
> > length 5.66 4.4 5.66 4.4
> > == 5.04 3.9 44.92 34.7
> > levels 4.80 3.7 5.56 4.3
> > is.na 4.24 3.3 4.24 3.3
> > dim 3.66 2.8 3.66 2.8
> > switch 3.60 2.8 3.80 2.9
> > vector 2.68 2.1 8.02 6.2
> > inherits 1.90 1.5 1.90 1.5
> > any 1.68 1.3 1.68 1.3
> > noNA.levels 1.46 1.1 7.84 6.1
> > .Call 1.40 1.1 1.40 1.1
> > ! 1.26 1.0 1.26 1.0
> > attr<- 1.06 0.8 1.06 0.8
> > .subset 1.00 0.8 1.00 0.8
> > class<- 0.82 0.6 0.82 0.6
> > != 0.80 0.6 0.80 0.6
> > levels.default 0.68 0.5 0.76 0.6
> > all 0.62 0.5 0.62 0.5
> > < 0.54 0.4 0.54 0.4
> > - 0.48 0.4 0.48 0.4
> > is.factor 0.44 0.3 2.34 1.8
> > .subset2 0.38 0.3 0.38 0.3
> > attr 0.36 0.3 0.36 0.3
> > is.character 0.28 0.2 0.28 0.2
> > is.null 0.28 0.2 0.28 0.2
> > | 0.26 0.2 0.26 0.2
> > oldClass<- 0.20 0.2 0.20 0.2
> > is.atomic 0.16 0.1 0.16 0.1
> > nzchar 0.10 0.1 0.10 0.1
> > is.numeric 0.06 0.0 0.06 0.0
> > oldClass 0.06 0.0 0.06 0.0
> > ( 0.04 0.0 0.04 0.0
> > [.data 0.02 0.0 0.02 0.0
> >
> > $by.total
> > total.time total.pct self.time self.pct
> > [ 85.44 66.0 6.70 5.2
> > [.data.frame 80.90 62.5 33.92 26.2
> > == 44.92 34.7 5.04 3.9
> > Ops.factor 40.08 31.0 8.10 6.3
> > [.factor 18.36 14.2 8.60 6.6
> > sort.int 13.70 10.6 6.82 5.3
> > NextMethod 12.68 9.8 12.68 9.8
> > vector 8.02 6.2 2.68 2.1
> > noNA.levels 7.84 6.1 1.46 1.1
> > names 6.54 5.1 6.54 5.1
> > length 5.66 4.4 5.66 4.4
> > levels 5.56 4.3 4.80 3.7
> > is.na 4.24 3.3 4.24 3.3
> > switch 3.80 2.9 3.60 2.8
> > dim 3.66 2.8 3.66 2.8
> > is.factor 2.34 1.8 0.44 0.3
> > inherits 1.90 1.5 1.90 1.5
> > any 1.68 1.3 1.68 1.3
> > .Call 1.40 1.1 1.40 1.1
> > ! 1.26 1.0 1.26 1.0
> > attr<- 1.06 0.8 1.06 0.8
> > .subset 1.00 0.8 1.00 0.8
> > class<- 0.82 0.6 0.82 0.6
> > != 0.80 0.6 0.80 0.6
> > levels.default 0.76 0.6 0.68 0.5
> > all 0.62 0.5 0.62 0.5
> > < 0.54 0.4 0.54 0.4
> > - 0.48 0.4 0.48 0.4
> > .subset2 0.38 0.3 0.38 0.3
> > attr 0.36 0.3 0.36 0.3
> > is.character 0.28 0.2 0.28 0.2
> > is.null 0.28 0.2 0.28 0.2
> > | 0.26 0.2 0.26 0.2
> > oldClass<- 0.20 0.2 0.20 0.2
> > is.atomic 0.16 0.1 0.16 0.1
> > nzchar 0.10 0.1 0.10 0.1
> > is.numeric 0.06 0.0 0.06 0.0
> > oldClass 0.06 0.0 0.06 0.0
> > ( 0.04 0.0 0.04 0.0
> > [.data 0.02 0.0 0.02 0.0
> >
> > $sampling.time
> > [1] 129.38
> >
> > Warning message:
> > In readLines(filename, n = chunksize) :
> > incomplete final line found on 'Rprof.out'
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
More information about the R-help
mailing list