[R] how to get rid of 2 for-loops and optimize runtime

joris meys jorismeys at gmail.com
Mon Oct 19 16:12:01 CEST 2009


Hi Ian,

first of all, take a look at the functions sapply, mapply, lapply,
tapply, ... : they are the more efficient way of implementing loops.

Second, could you elaborate a bit further on the data set : the amount
of the month ago, is that one value from another row, or the sum of
all values in the previous month? I saw in your example dataset that
the last month has 2 rows, but couldn't figure out whether that's a
typo or really means something. That's necessary information to
optimize your code. 129s is indeed far too long for a simple action.

Cheers
Joris

On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems
<ian.willems at uz.kuleuven.ac.be> wrote:
> Short: get rid of the loops I use and optimize runtime
>
> Dear all,
>
> I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 100000)
>
> Table before
> Year  month quarter yearmonth Service ...  Amount
> 2009  9        Q3            092009          A                ...    120
> 2009  9        Q3            092009          B                 ...     80
> 2009  8        Q3           082009           A                  ...     40
> 2009  7        Q3           072009           A                   ...      50
>
> The result I want
> Year month  quarter yearmonth Service ...    Amount   amound_lastmonth
> 2009 9           Q3          092009              A            ...    120         40
> 2009 9           Q3          092009              B            ...    80           ...
> 2009 8           Q3          082009              A           ...    40            50
> 2009 7           Q3          072009              A         ...     50             ...
>
> Table is not exactly the same but gives a good idea what I have and what I want
>
> The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows.
> I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below)
> But I don't really understand the output
> I guess I need code that requires linear time and need to get rid of the 2 for loops.
> can someone help me or tell me what else I can do to optimize my runtime
>
> I use R 2.9.2
> windows Xp service pack3
>
> Thank you in advance
>
> Best regards,
>
> Willems Ian
>
>
> *****************************
> dataset[,5]= month
> dataset[,3]= year
> dataset[,22]= amount
> dataset[,14]= servicetype
>
> [CODE]
> #for each row of the matrix check if each row has..
>> for (j in 1:Number_rows) {
> + sum<-0
> + for(i in 1:Number_rows){
> + if (dataset[j,14]== dataset[i,14]) #..the same service type
> +   {if (dataset[j,18]== dataset[i,18]) # .. the same department
> +        {if (dataset[j,5]== "1")  # if month=1, month ago is 12 and year is -1
> +           {if ("12"== dataset[i,5])
> +            {if ((dataset[j,3]-1)== dataset[i,3])
> +
> +         { sum<-sum + dataset[i,22]}
> +      }}
> +      else {
> +       if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is month -1
> +         { if (dataset[j,3]== dataset[i,3])
> +         {sum<-sum + dataset[i,22]}
> +      }}}}}}
>
> [\Code]
>
>> summaryRprof()
> $by.self
>               self.time self.pct total.time total.pct
> [.data.frame       33.92  26.2    80.90      62.5
> NextMethod         12.68  9.8     12.68       9.8
> [.factor            8.60  6.6      18.36      14.2
> Ops.factor          8.10  6.3      40.08      31.0
> sort.int            6.82  5.3      13.70      10.6
> [                   6.70  5.2      85.44      66.0
> names               6.54  5.1       6.54       5.1
> length              5.66  4.4       5.66       4.4
> ==                  5.04  3.9      44.92      34.7
> levels              4.80  3.7       5.56       4.3
> is.na               4.24  3.3       4.24       3.3
> dim                 3.66  2.8       3.66       2.8
> switch              3.60  2.8       3.80       2.9
> vector              2.68  2.1       8.02       6.2
> inherits            1.90  1.5       1.90       1.5
> any                 1.68  1.3       1.68       1.3
> noNA.levels         1.46  1.1       7.84       6.1
> .Call               1.40  1.1       1.40       1.1
> !                   1.26  1.0       1.26       1.0
> attr<-              1.06  0.8       1.06       0.8
> .subset             1.00  0.8       1.00       0.8
> class<-             0.82  0.6       0.82       0.6
> !=                  0.80  0.6       0.80       0.6
> levels.default      0.68  0.5       0.76       0.6
> all                 0.62  0.5       0.62       0.5
> <                   0.54  0.4       0.54       0.4
> -                   0.48  0.4       0.48       0.4
> is.factor           0.44  0.3       2.34       1.8
> .subset2            0.38  0.3       0.38       0.3
> attr                0.36  0.3       0.36       0.3
> is.character        0.28  0.2       0.28       0.2
> is.null             0.28  0.2       0.28       0.2
> |                   0.26  0.2       0.26       0.2
> oldClass<-          0.20  0.2       0.20       0.2
> is.atomic           0.16  0.1       0.16       0.1
> nzchar              0.10  0.1       0.10       0.1
> is.numeric          0.06  0.0       0.06       0.0
> oldClass            0.06  0.0       0.06       0.0
> (                   0.04  0.0       0.04       0.0
> [.data              0.02  0.0       0.02       0.0
>
> $by.total
>               total.time total.pct self.time self.pct
> [                   85.44  66.0      6.70      5.2
> [.data.frame        80.90  62.5     33.92     26.2
> ==                  44.92  34.7      5.04      3.9
> Ops.factor          40.08  31.0      8.10      6.3
> [.factor            18.36  14.2      8.60      6.6
> sort.int            13.70  10.6      6.82      5.3
> NextMethod          12.68  9.8     12.68      9.8
> vector               8.02  6.2      2.68      2.1
> noNA.levels          7.84  6.1      1.46      1.1
> names                6.54  5.1      6.54      5.1
> length               5.66  4.4      5.66      4.4
> levels               5.56  4.3      4.80      3.7
> is.na                4.24  3.3      4.24      3.3
> switch               3.80  2.9      3.60      2.8
> dim                  3.66  2.8      3.66      2.8
> is.factor            2.34  1.8      0.44      0.3
> inherits             1.90  1.5      1.90      1.5
> any                  1.68  1.3      1.68      1.3
> .Call                1.40  1.1      1.40      1.1
> !                    1.26  1.0      1.26      1.0
> attr<-               1.06  0.8      1.06      0.8
> .subset              1.00  0.8      1.00      0.8
> class<-              0.82  0.6      0.82      0.6
> !=                   0.80  0.6      0.80      0.6
> levels.default       0.76  0.6      0.68      0.5
> all                  0.62  0.5      0.62      0.5
> <                    0.54  0.4      0.54      0.4
> -                    0.48  0.4      0.48      0.4
> .subset2             0.38  0.3      0.38      0.3
> attr                 0.36  0.3      0.36      0.3
> is.character         0.28  0.2      0.28      0.2
> is.null              0.28  0.2      0.28      0.2
> |                    0.26  0.2      0.26      0.2
> oldClass<-           0.20  0.2      0.20      0.2
> is.atomic            0.16  0.1      0.16      0.1
> nzchar               0.10  0.1      0.10      0.1
> is.numeric           0.06  0.0      0.06      0.0
> oldClass             0.06  0.0      0.06      0.0
> (                    0.04  0.0      0.04      0.0
> [.data               0.02  0.0      0.02      0.0
>
> $sampling.time
> [1] 129.38
>
> Warning message:
> In readLines(filename, n = chunksize) :
>  incomplete final line found on 'Rprof.out'
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list