[R] how to get rid of 2 for-loops and optimize runtime
Ian Willems
ian.willems at uz.kuleuven.ac.be
Tue Oct 20 15:16:05 CEST 2009
Hi Joris,
The amount of a month ago is normally one value from another row.
But I used 'sum<-sum + dataset[i,22]' because I would like to reuse the code also for other tables. In some tables it is possible that the value of last month is the sum of values from different rows.
Thank u for your time
Greetings,
Ian
-----Oorspronkelijk bericht-----
Van: joris meys [mailto:jorismeys at gmail.com]
Verzonden: maandag 19 oktober 2009 16:12
Aan: Ian Willems
CC: r-help at r-project.org
Onderwerp: Re: [R] how to get rid of 2 for-loops and optimize runtime
Hi Ian,
first of all, take a look at the functions sapply, mapply, lapply,
tapply, ... : they are the more efficient way of implementing loops.
Second, could you elaborate a bit further on the data set : the amount
of the month ago, is that one value from another row, or the sum of
all values in the previous month? I saw in your example dataset that
the last month has 2 rows, but couldn't figure out whether that's a
typo or really means something. That's necessary information to
optimize your code. 129s is indeed far too long for a simple action.
Cheers
Joris
On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems
<ian.willems at uz.kuleuven.ac.be> wrote:
> Short: get rid of the loops I use and optimize runtime
>
> Dear all,
>
> I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 100000)
>
> Table before
> Year month quarter yearmonth Service ... Amount
> 2009 9 Q3 092009 A ... 120
> 2009 9 Q3 092009 B ... 80
> 2009 8 Q3 082009 A ... 40
> 2009 7 Q3 072009 A ... 50
>
> The result I want
> Year month quarter yearmonth Service ... Amount amound_lastmonth
> 2009 9 Q3 092009 A ... 120 40
> 2009 9 Q3 092009 B ... 80 ...
> 2009 8 Q3 082009 A ... 40 50
> 2009 7 Q3 072009 A ... 50 ...
>
> Table is not exactly the same but gives a good idea what I have and what I want
>
> The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows.
> I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below)
> But I don't really understand the output
> I guess I need code that requires linear time and need to get rid of the 2 for loops.
> can someone help me or tell me what else I can do to optimize my runtime
>
> I use R 2.9.2
> windows Xp service pack3
>
> Thank you in advance
>
> Best regards,
>
> Willems Ian
>
>
> *****************************
> dataset[,5]= month
> dataset[,3]= year
> dataset[,22]= amount
> dataset[,14]= servicetype
>
> [CODE]
> #for each row of the matrix check if each row has..
>> for (j in 1:Number_rows) {
> + sum<-0
> + for(i in 1:Number_rows){
> + if (dataset[j,14]== dataset[i,14]) #..the same service type
> + {if (dataset[j,18]== dataset[i,18]) # .. the same department
> + {if (dataset[j,5]== "1") # if month=1, month ago is 12 and year is -1
> + {if ("12"== dataset[i,5])
> + {if ((dataset[j,3]-1)== dataset[i,3])
> +
> + { sum<-sum + dataset[i,22]}
> + }}
> + else {
> + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is month -1
> + { if (dataset[j,3]== dataset[i,3])
> + {sum<-sum + dataset[i,22]}
> + }}}}}}
>
> [\Code]
>
>> summaryRprof()
> $by.self
> self.time self.pct total.time total.pct
> [.data.frame 33.92 26.2 80.90 62.5
> NextMethod 12.68 9.8 12.68 9.8
> [.factor 8.60 6.6 18.36 14.2
> Ops.factor 8.10 6.3 40.08 31.0
> sort.int 6.82 5.3 13.70 10.6
> [ 6.70 5.2 85.44 66.0
> names 6.54 5.1 6.54 5.1
> length 5.66 4.4 5.66 4.4
> == 5.04 3.9 44.92 34.7
> levels 4.80 3.7 5.56 4.3
> is.na 4.24 3.3 4.24 3.3
> dim 3.66 2.8 3.66 2.8
> switch 3.60 2.8 3.80 2.9
> vector 2.68 2.1 8.02 6.2
> inherits 1.90 1.5 1.90 1.5
> any 1.68 1.3 1.68 1.3
> noNA.levels 1.46 1.1 7.84 6.1
> .Call 1.40 1.1 1.40 1.1
> ! 1.26 1.0 1.26 1.0
> attr<- 1.06 0.8 1.06 0.8
> .subset 1.00 0.8 1.00 0.8
> class<- 0.82 0.6 0.82 0.6
> != 0.80 0.6 0.80 0.6
> levels.default 0.68 0.5 0.76 0.6
> all 0.62 0.5 0.62 0.5
> < 0.54 0.4 0.54 0.4
> - 0.48 0.4 0.48 0.4
> is.factor 0.44 0.3 2.34 1.8
> .subset2 0.38 0.3 0.38 0.3
> attr 0.36 0.3 0.36 0.3
> is.character 0.28 0.2 0.28 0.2
> is.null 0.28 0.2 0.28 0.2
> | 0.26 0.2 0.26 0.2
> oldClass<- 0.20 0.2 0.20 0.2
> is.atomic 0.16 0.1 0.16 0.1
> nzchar 0.10 0.1 0.10 0.1
> is.numeric 0.06 0.0 0.06 0.0
> oldClass 0.06 0.0 0.06 0.0
> ( 0.04 0.0 0.04 0.0
> [.data 0.02 0.0 0.02 0.0
>
> $by.total
> total.time total.pct self.time self.pct
> [ 85.44 66.0 6.70 5.2
> [.data.frame 80.90 62.5 33.92 26.2
> == 44.92 34.7 5.04 3.9
> Ops.factor 40.08 31.0 8.10 6.3
> [.factor 18.36 14.2 8.60 6.6
> sort.int 13.70 10.6 6.82 5.3
> NextMethod 12.68 9.8 12.68 9.8
> vector 8.02 6.2 2.68 2.1
> noNA.levels 7.84 6.1 1.46 1.1
> names 6.54 5.1 6.54 5.1
> length 5.66 4.4 5.66 4.4
> levels 5.56 4.3 4.80 3.7
> is.na 4.24 3.3 4.24 3.3
> switch 3.80 2.9 3.60 2.8
> dim 3.66 2.8 3.66 2.8
> is.factor 2.34 1.8 0.44 0.3
> inherits 1.90 1.5 1.90 1.5
> any 1.68 1.3 1.68 1.3
> .Call 1.40 1.1 1.40 1.1
> ! 1.26 1.0 1.26 1.0
> attr<- 1.06 0.8 1.06 0.8
> .subset 1.00 0.8 1.00 0.8
> class<- 0.82 0.6 0.82 0.6
> != 0.80 0.6 0.80 0.6
> levels.default 0.76 0.6 0.68 0.5
> all 0.62 0.5 0.62 0.5
> < 0.54 0.4 0.54 0.4
> - 0.48 0.4 0.48 0.4
> .subset2 0.38 0.3 0.38 0.3
> attr 0.36 0.3 0.36 0.3
> is.character 0.28 0.2 0.28 0.2
> is.null 0.28 0.2 0.28 0.2
> | 0.26 0.2 0.26 0.2
> oldClass<- 0.20 0.2 0.20 0.2
> is.atomic 0.16 0.1 0.16 0.1
> nzchar 0.10 0.1 0.10 0.1
> is.numeric 0.06 0.0 0.06 0.0
> oldClass 0.06 0.0 0.06 0.0
> ( 0.04 0.0 0.04 0.0
> [.data 0.02 0.0 0.02 0.0
>
> $sampling.time
> [1] 129.38
>
> Warning message:
> In readLines(filename, n = chunksize) :
> incomplete final line found on 'Rprof.out'
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list