[R] where/what is i? for loop (black?) magic

Liaw, Andy andy_liaw at merck.com
Thu Jun 18 15:25:42 CEST 2009


From: Duncan Murdoch
> 
> Liaw, Andy wrote:
> > A colleague and I were trying to understand all the 
> possible things one
> > can do with for loops in R, and found some surprises.  I think we've
> > done sufficient detective work to have a good guess as to 
> what's going
> > on "underneath", but it would be nice to get some confirmation, and
> > better yet, perhaps documentation in the R-lang manual.  
> Basically, the
> > question is, how/what does R do with the loop index 
> variable?  Below are
> > some examples:
> >   
> 
> I think it is documented in the ?Control topic that a copy of the seq 
> argument (the 1:2 in your first example) is made at the 
> beginning, and that
> altering var (your i) doesn't affect the loop.  One other thing you 
> didn't investigate is what is the value of an expression like
> 
>  loopval <- for (i in 1:2) { i }
> 
> This sets loopval to 2, but in R-devel (2.10.0 to be) this 
> has changed:  
> loops now have NULL as their value.

Thanks to Duncan (as well as Brian Ripley and Bill Dunlap who replied
off-list) for the explanation.  As Brian pointed out, such topic is more
suited for R-devel, but since I started it here, I thought I might as
well wrap it up here.

I failed to mention the version R I was using (it was 2.9.0 patched
2009-04-20 r48365), thinking that things as basic as the behavior of for
loops isn't likely to change.  I was wrong, as the Duncan and Brian
pointed out, there are a couple of entries in the NEWS for
R-2.10.0-to-be related to for loops.  However, at least on the surface
the behavior has not changed (I just checked the Windows build of
R-devel 2009-06-16 r48790 and got exactly the same result).

Regarding documentation, Duncan was course right.  ?"for" would have
told me:

"The seq in a for loop is evaluated at the start of the loop; changing
it subsequently does not affect the loop. If seq has length zero the
body of the loop is skipped. Otherwise the variable var is assigned in
turn the value of each element of seq. You can assign to var within the
body of the loop, but this will not affect the next iteration. When the
loop terminates, var remains as a variable containing its latest value."

[Note, however, that the last sentence isn't true, at least as I
understand it, because I still get the following in R-devel.  I believe
this is the correct behavior.  Perhaps the help page need some editing?]

R> for (i in 1:2) { i <- 17; print(i) }
[1] 17
[1] 17
R> print(i)
[1] 17


As Duncan pointed out below, as long as one is careful, no unnecessary
copying is done, so my worry was mostly unfounded.

Again, thanks to the wizaRds for the enlightenment!

Best,
Andy
 
> 
> > R> for (i in 1:2) { i <- 17; print(i) }
> > [1] 17
> > [1] 17
> > R> print(i)
> > [1] 17
> > R> x <- 1:2
> > R> for (i in x) { print(i); rm(i) }
> > [1] 1
> > [1] 2
> > R> i
> > Error: object 'i' not found
> > R> for (i in x) { print(i); rm(x) }
> > [1] 1
> > [1] 2
> > Warning message:
> > In rm(x) : object 'x' not found
> > R> i
> > [1] 2
> > R> x <- 1:2
> > R> for (i in x) { print(i); i <- 17; print(i) }
> > [1] 1
> > [1] 17
> > [1] 2
> > [1] 17
> >
> > The guess is that at the beginning for the loop, R makes a 
> copy of the
> > object that's being looped over ("x" in examples above) 
> somewhere "under
> > cover", and at the beginning of each iteration, assign the "current"
> > element to the index variable ("I" in the examples above).  
> This is the
> > only logical explanation I can come up with given the 
> behavior observed
> > above.  Can anyone confirm/deny this?  If this is true, one thing to
> > consider is not to use a large object to loop over (e.g., 
> columns of a
> > very large data frame).
> >   
> 
> It is uncommon to modify seq (your x) in the loop.  In the usual case 
> where you don't modify it, the fact that the loop has made a 
> copy should 
> not matter:  R won't actually copy the complete object until 
> one version 
> of it is changed.
> 
> So this sequence
> 
> seq <- data.frame(a=1:1000000, b=1:1000000)
> for (var in seq) { print(var[1]) }
> 
> hardly uses any more memory during the loop than it used in creating 
> seq, but this sequence
> 
> for (var in seq) { seq$b[1] <- -1; print(var[1]) }
> 
> uses a lot more:  seq is modified so a copy is made, and seq$b is 
> modified after var is set to it, so a copy is made of that 
> too.  Both of 
> the loops print two 1's, by the way.
> 
> Duncan Murdoch
> > Andy
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list