[R] how to get rid of 2 for-loops and optimize runtime

Tue Oct 20 15:16:05 CEST 2009

Hi Joris,

The amount of a month ago is normally one value from another row.
But I used 'sum<-sum + dataset[i,22]' because I would like to reuse the code also for other tables. In some tables it is possible that the value of last month is the sum of values from different rows.

Thank u for your time
Greetings,

Ian

-----Oorspronkelijk bericht-----
Van: joris meys [mailto:jorismeys at gmail.com] 
Verzonden: maandag 19 oktober 2009 16:12
Aan: Ian Willems
CC: r-help at r-project.org
Onderwerp: Re: [R] how to get rid of 2 for-loops and optimize runtime

Hi Ian,

first of all, take a look at the functions sapply, mapply, lapply,
tapply, ... : they are the more efficient way of implementing loops.

Second, could you elaborate a bit further on the data set : the amount
of the month ago, is that one value from another row, or the sum of
all values in the previous month? I saw in your example dataset that
the last month has 2 rows, but couldn't figure out whether that's a
typo or really means something. That's necessary information to
optimize your code. 129s is indeed far too long for a simple action.

Cheers
Joris

On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems
<ian.willems at uz.kuleuven.ac.be> wrote:
> Short: get rid of the loops I use and optimize runtime
>
> Dear all,
>
> I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 100000)
>
> Table before
> Year  month quarter yearmonth Service ...  Amount
> 2009  9        Q3            092009          A                ...    120
> 2009  9        Q3            092009          B                 ...     80
> 2009  8        Q3           082009           A                  ...     40
> 2009  7        Q3           072009           A                   ...      50
>
> The result I want
> Year month  quarter yearmonth Service ...    Amount   amound_lastmonth
> 2009 9           Q3          092009              A            ...    120         40
> 2009 9           Q3          092009              B            ...    80           ...
> 2009 8           Q3          082009              A           ...    40            50
> 2009 7           Q3          072009              A         ...     50             ...
>
> Table is not exactly the same but gives a good idea what I have and what I want
>
> The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows.
> I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below)
> But I don't really understand the output
> I guess I need code that requires linear time and need to get rid of the 2 for loops.
> can someone help me or tell me what else I can do to optimize my runtime
>
> I use R 2.9.2
> windows Xp service pack3
>
> Thank you in advance
>
> Best regards,
>
> Willems Ian
>
>
> *****************************
> dataset[,5]= month
> dataset[,3]= year
> dataset[,22]= amount
> dataset[,14]= servicetype
>
> [CODE]
> #for each row of the matrix check if each row has..
>> for (j in 1:Number_rows) {
> + sum<-0
> + for(i in 1:Number_rows){
> + if (dataset[j,14]== dataset[i,14]) #..the same service type
> +   {if (dataset[j,18]== dataset[i,18]) # .. the same department
> +        {if (dataset[j,5]== "1")  # if month=1, month ago is 12 and year is -1
> +           {if ("12"== dataset[i,5])
> +            {if ((dataset[j,3]-1)== dataset[i,3])
> +
> +         { sum<-sum + dataset[i,22]}
> +      }}
> +      else {
> +       if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is month -1
> +         { if (dataset[j,3]== dataset[i,3])
> +         {sum<-sum + dataset[i,22]}
> +      }}}}}}
>
> [\Code]
>
>> summaryRprof()
> $by.self
>               self.time self.pct total.time total.pct
> [.data.frame       33.92  26.2    80.90      62.5
> NextMethod         12.68  9.8     12.68       9.8
> [.factor            8.60  6.6      18.36      14.2
> Ops.factor          8.10  6.3      40.08      31.0
> sort.int            6.82  5.3      13.70      10.6
> [                   6.70  5.2      85.44      66.0
> names               6.54  5.1       6.54       5.1
> length              5.66  4.4       5.66       4.4
> ==                  5.04  3.9      44.92      34.7
> levels              4.80  3.7       5.56       4.3
> is.na               4.24  3.3       4.24       3.3
> dim                 3.66  2.8       3.66       2.8
> switch              3.60  2.8       3.80       2.9
> vector              2.68  2.1       8.02       6.2
> inherits            1.90  1.5       1.90       1.5
> any                 1.68  1.3       1.68       1.3
> noNA.levels         1.46  1.1       7.84       6.1
> .Call               1.40  1.1       1.40       1.1
> !                   1.26  1.0       1.26       1.0
> attr<-              1.06  0.8       1.06       0.8
> .subset             1.00  0.8       1.00       0.8
> class<-             0.82  0.6       0.82       0.6
> !=                  0.80  0.6       0.80       0.6
> levels.default      0.68  0.5       0.76       0.6
> all                 0.62  0.5       0.62       0.5
> <                   0.54  0.4       0.54       0.4
> -                   0.48  0.4       0.48       0.4
> is.factor           0.44  0.3       2.34       1.8
> .subset2            0.38  0.3       0.38       0.3
> attr                0.36  0.3       0.36       0.3
> is.character        0.28  0.2       0.28       0.2
> is.null             0.28  0.2       0.28       0.2
> |                   0.26  0.2       0.26       0.2
> oldClass<-          0.20  0.2       0.20       0.2
> is.atomic           0.16  0.1       0.16       0.1
> nzchar              0.10  0.1       0.10       0.1
> is.numeric          0.06  0.0       0.06       0.0
> oldClass            0.06  0.0       0.06       0.0
> (                   0.04  0.0       0.04       0.0
> [.data              0.02  0.0       0.02       0.0
>
> $by.total
>               total.time total.pct self.time self.pct
> [                   85.44  66.0      6.70      5.2
> [.data.frame        80.90  62.5     33.92     26.2
> ==                  44.92  34.7      5.04      3.9
> Ops.factor          40.08  31.0      8.10      6.3
> [.factor            18.36  14.2      8.60      6.6
> sort.int            13.70  10.6      6.82      5.3
> NextMethod          12.68  9.8     12.68      9.8
> vector               8.02  6.2      2.68      2.1
> noNA.levels          7.84  6.1      1.46      1.1
> names                6.54  5.1      6.54      5.1
> length               5.66  4.4      5.66      4.4
> levels               5.56  4.3      4.80      3.7
> is.na                4.24  3.3      4.24      3.3
> switch               3.80  2.9      3.60      2.8
> dim                  3.66  2.8      3.66      2.8
> is.factor            2.34  1.8      0.44      0.3
> inherits             1.90  1.5      1.90      1.5
> any                  1.68  1.3      1.68      1.3
> .Call                1.40  1.1      1.40      1.1
> !                    1.26  1.0      1.26      1.0
> attr<-               1.06  0.8      1.06      0.8
> .subset              1.00  0.8      1.00      0.8
> class<-              0.82  0.6      0.82      0.6
> !=                   0.80  0.6      0.80      0.6
> levels.default       0.76  0.6      0.68      0.5
> all                  0.62  0.5      0.62      0.5
> <                    0.54  0.4      0.54      0.4
> -                    0.48  0.4      0.48      0.4
> .subset2             0.38  0.3      0.38      0.3
> attr                 0.36  0.3      0.36      0.3
> is.character         0.28  0.2      0.28      0.2
> is.null              0.28  0.2      0.28      0.2
> |                    0.26  0.2      0.26      0.2
> oldClass<-           0.20  0.2      0.20      0.2
> is.atomic            0.16  0.1      0.16      0.1
> nzchar               0.10  0.1      0.10      0.1
> is.numeric           0.06  0.0      0.06      0.0
> oldClass             0.06  0.0      0.06      0.0
> (                    0.04  0.0      0.04      0.0
> [.data               0.02  0.0      0.02      0.0
>
> $sampling.time
> [1] 129.38
>
> Warning message:
> In readLines(filename, n = chunksize) :
>  incomplete final line found on 'Rprof.out'
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>