[R] Speeding up a loop

Rui Barradas ruipbarradas at sapo.pt
Sat Jul 21 20:53:33 CEST 2012


Ok, sorry, I should have included some comments.

The function is divided in three parts, 1. intro, 2. decision, 3. keep rows.
Part 3 is the function keep(), internal to to.keep(). Let's start with 1.

1. Setup some variables first.
1.a) The variables 'a'.
If the input object 'x' is a matrix this doesn't give a great speed-up 
but if 'x' is a data.frame, extraction is time consuming.
So, do this once only, at the beginning.
1.b) The new environment.
This is because my first version would need to change values declared 
outside the internal function.
This can be done with the global assignment operator, <<-, but this 
pratice should be avoided, it's easy to mess things up.
Note that all the variables changed inside the internal function are in 
this new environment, 'e'.
In particular note that 'result' is initialized with 1000 rows.
2. The loop.
This is where we decide if we want to keep that row. I have negated the 
condition from an original 'no'.
The 'no' condition:
     a1[i] < a1 & a2[i] < a2 & a3[i] > a3 & a4[i] < a4
Then the test would be:
     if(any(no)) dont_keep else keep.  # pseudo-code
Not in pseudo-code:
     if( all( !no ) ) keep(i, e)
The down side of this is that the original is more readable.

3. The internal function, keep().
Considering the small number of rows I have used for tests, e$result was 
initialized to 1e3.
With 5e5 lines I would increase this number to 1e5.
First, the funcion updates the [row number] pointer into 'result' and 
checks if we are at a 'result' limit.
If yes, make it bigger by e$increment [ == 1e3 ] rows.
Then just assign row i from matrix/df 'x' to the appropriate row of 
e$result.
The reason why we need the environment is because on function return, 
all but the returned value is lost.
We could return a list with saved values of ires, curr.rows, result, and 
return the list.
But this would complicate and slow things down. Assign, update and 
reassign. Messy.
Environments can help keep it "simple", in the sense of to keep together 
what is meant to be used together.

And now I hope there is not an overdose of comments :)

Rui Barradas

Em 21-07-2012 18:37, wwreith escreveu:
> Any chance I could ask for an idiots guide for function to.keep(x). I
> understand how to use it but not what some of the lines are doing. Comments
> would be extremely helpful.
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list