[Rd] Speed of for loops
Byron Ellis
byron.ellis at gmail.com
Wed Jan 31 01:01:59 CET 2007
IIRC a for loop has more per-iteration overhead that lapply, but the
real answer is "it depends on what you're doing exactly." I've seen it
be a faster, slower and equal approach.
On 1/30/07, Oleg Sklyar <osklyar at ebi.ac.uk> wrote:
> It is surely an elegant way of doing things (although far from being
> easy to parse visually) but is it really faster than a loop?
>
> After all, the indexing problem is the same and sapply simply does the
> same job as for in this case, plus "<<-" will _search_ through the
> environment on every single step. Where is the gain?
>
> Oleg
>
> --
> Dr Oleg Sklyar | EBI-EMBL, Cambridge CB10 1SD, UK | +44-1223-494466
>
>
> Byron Ellis wrote:
> > Actually, why not use a closure to store previous value(s)?
> >
> > In the simple case, which depends on x_i and y_{i-1}
> >
> > gen.iter = function(x) {
> > y = NA
> > function(i) {
> > y <<- if(is.na(y)) x[i] else y+x[i]
> > }
> > }
> >
> > y = sapply(1:10,gen.iter(x))
> >
> > Obviously you can modify the function for the bookkeeping required to
> > manage whatever lag you need. I use this sometimes when I'm
> > implementing MCMC samplers of various kinds.
> >
> >
> > On 1/30/07, Herve Pages <hpages at fhcrc.org> wrote:
> >> Tom McCallum wrote:
> >>> Hi Everyone,
> >>>
> >>> I have a question about for loops. If you have something like:
> >>>
> >>> f <- function(x) {
> >>> y <- rep(NA,10);
> >>> for( i in 1:10 ) {
> >>> if ( i > 3 ) {
> >>> if ( is.na(y[i-3]) == FALSE ) {
> >>> # some calculation F which depends on one or more of the previously
> >>> generated values in the series
> >>> y[i] = y[i-1]+x[i];
> >>> } else {
> >>> y[i] <- x[i];
> >>> }
> >>> }
> >>> }
> >>> y
> >>> }
> >>>
> >>> e.g.
> >>>
> >>>> f(c(1,2,3,4,5,6,7,8,9,10,11,12));
> >>> [1] NA NA NA 4 5 6 13 21 30 40
> >>>
> >>> is there a faster way to process this than with a 'for' loop? I have
> >>> looked at lapply as well but I have read that lapply is no faster than a
> >>> for loop and for my particular application it is easier to use a for loop.
> >>> Also I have seen 'rle' which I think may help me but am not sure as I have
> >>> only just come across it, any ideas?
> >> Hi Tom,
> >>
> >> In the general case, you need a loop in order to propagate calculations
> >> and their results across a vector.
> >>
> >> In _your_ particular case however, it seems that all you are doing is a
> >> cumulative sum on x (at least this is what's happening for i >= 6).
> >> So you could do:
> >>
> >> f2 <- function(x)
> >> {
> >> offset <- 3
> >> start_propagate_at <- 6
> >> y_length <- 10
> >> init_range <- (offset+1):start_propagate_at
> >> y <- rep(NA, offset)
> >> y[init_range] <- x[init_range]
> >> y[start_propagate_at:y_length] <- cumsum(x[start_propagate_at:y_length])
> >> y
> >> }
> >>
> >> and it will return the same thing as your function 'f' (at least when 'x' doesn't
> >> contain NAs) but it's not faster :-/
> >>
> >> IMO, using sapply for propagating calculations across a vector is not appropriate
> >> because:
> >>
> >> (1) It requires special care. For example, this:
> >>
> >> > x <- 1:10
> >> > sapply(2:length(x), function(i) {x[i] <- x[i-1]+x[i]})
> >>
> >> doesn't work because the 'x' symbol on the left side of the <- in the
> >> anonymous function doesn't refer to the 'x' symbol defined in the global
> >> environment. So you need to use tricks like this:
> >>
> >> > sapply(2:length(x),
> >> function(i) {x[i] <- x[i-1]+x[i]; assign("x", x, envir=.GlobalEnv); x[i]})
> >>
> >> (2) Because of this kind of tricks, then it is _very_ slow (about 10 times
> >> slower or more than a 'for' loop).
> >>
> >> Cheers,
> >> H.
> >>
> >>
> >>> Many thanks
> >>>
> >>> Tom
> >>>
> >>>
> >>>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
>
--
Byron Ellis (byron.ellis at gmail.com)
"Oook" -- The Librarian
More information about the R-devel
mailing list