[R] where/what is i? for loop (black?) magic

Duncan Murdoch murdoch at stats.uwo.ca
Thu Jun 18 05:40:56 CEST 2009


Liaw, Andy wrote:
> A colleague and I were trying to understand all the possible things one
> can do with for loops in R, and found some surprises.  I think we've
> done sufficient detective work to have a good guess as to what's going
> on "underneath", but it would be nice to get some confirmation, and
> better yet, perhaps documentation in the R-lang manual.  Basically, the
> question is, how/what does R do with the loop index variable?  Below are
> some examples:
>   

I think it is documented in the ?Control topic that a copy of the seq 
argument (the 1:2 in your first example) is made at the beginning, and that
altering var (your i) doesn't affect the loop.  One other thing you 
didn't investigate is what is the value of an expression like

 loopval <- for (i in 1:2) { i }

This sets loopval to 2, but in R-devel (2.10.0 to be) this has changed:  
loops now have NULL as their value.


> R> for (i in 1:2) { i <- 17; print(i) }
> [1] 17
> [1] 17
> R> print(i)
> [1] 17
> R> x <- 1:2
> R> for (i in x) { print(i); rm(i) }
> [1] 1
> [1] 2
> R> i
> Error: object 'i' not found
> R> for (i in x) { print(i); rm(x) }
> [1] 1
> [1] 2
> Warning message:
> In rm(x) : object 'x' not found
> R> i
> [1] 2
> R> x <- 1:2
> R> for (i in x) { print(i); i <- 17; print(i) }
> [1] 1
> [1] 17
> [1] 2
> [1] 17
>
> The guess is that at the beginning for the loop, R makes a copy of the
> object that's being looped over ("x" in examples above) somewhere "under
> cover", and at the beginning of each iteration, assign the "current"
> element to the index variable ("I" in the examples above).  This is the
> only logical explanation I can come up with given the behavior observed
> above.  Can anyone confirm/deny this?  If this is true, one thing to
> consider is not to use a large object to loop over (e.g., columns of a
> very large data frame).
>   

It is uncommon to modify seq (your x) in the loop.  In the usual case 
where you don't modify it, the fact that the loop has made a copy should 
not matter:  R won't actually copy the complete object until one version 
of it is changed.

So this sequence

seq <- data.frame(a=1:1000000, b=1:1000000)
for (var in seq) { print(var[1]) }

hardly uses any more memory during the loop than it used in creating 
seq, but this sequence

for (var in seq) { seq$b[1] <- -1; print(var[1]) }

uses a lot more:  seq is modified so a copy is made, and seq$b is 
modified after var is set to it, so a copy is made of that too.  Both of 
the loops print two 1's, by the way.

Duncan Murdoch
> Andy




More information about the R-help mailing list