[R] nested for loops too slow

Sun Apr 12 23:32:09 CEST 2015

Well, sort of...

aggregate() is basically a wrapper for lapply(), which ultimately must loop
over the function call at the R interpreter level, as opposed to vectorized
functions that loop at the C level and hence can be orders of magnitude
faster. As a result, there is often little difference in efficiency between
explicit and *smart* (in the sense that Pat Burns has already pointed out
of not growing structures at each iteration,among other things)  for()
looping and apply-type calls. For some of us, the chief advantage of the
*apply idioms is that the code is more readable and maintainable, with R
handling fussy details of loop indexing, for example.lapply() is also more
in keeping with the functional programming paradigm.
Others
find both these "virtues" to be annoyances,

however, and prefer explicit *smart* looping. Chaque un á

son goû
t
.

None of which necessarily denies the wisdom of the approach you've
suggested, however. It may indeed be considerably faster,
but timing will have to tell. I am just trying to correct
(again) 
the

widely held misperception
that yo
u
 seem to
express
.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom."
Clifford Stoll

On Sun, Apr 12, 2015 at 1:48 PM, Thierry Onkelinx <thierry.onkelinx at inbo.be>
wrote:

> You don't need loops at all.
>
>     grw <- aggregate(gw ~ ts + ISEG + iter, data = dat, FUN = sum)
>     GRW <- aggregate(gw ~ ts + ISEG, data = grw, FUN = function(x){max(x) -
> min(x)})
>     DC <- aggregate(div ~ ts + ISEG, data = subset(dat, IRCH == 1), FUN =
> function(x){max(x) - min(x)})
>     iter <- aggregate(iter ~ ts + ISEG, data = subset(dat, IRCH == 1), FUN
> = max)
>     tmp <- merge(DC, iter)
>     merge(tmp, GRW)
>
> another option is to use the plyr package
>
>     library(plyr)
>     merge(
>       ddply(
>         subset(dat, IRCH == 1),
>         c("ts", "ISEG"),
>         summarize,
>         divChng = max(div) - min(div),
>         max.iter = max(iter)
>       ),
>       ddply(
>         dat,
>         c("ts", "ISEG"),
>         summarize,
>         gwChng = diff(range(ave(gw, iter, FUN = sum)))
>       )
>     )
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2015-04-12 15:47 GMT+02:00 Morway, Eric <emorway at usgs.gov>:
>
> > The small example below works lighting-fast; however, when I run the same
> > script on my real problem, a 1Gb text file, the for loops have been
> running
> > for over 24 hrs and I have no idea if the processing is 10% done or 90%
> > done.  I have not been able to figure out a betteR way to code up the
> > material within the for loops at the end of the example below.  The
> > contents of divChng, the final product, are exactly what I'm after, but I
> > need help formulating more efficient R script, I've got two more 1Gb
> files
> > to process after the current one finishes, whenever that is...
> >
> > I appreciate any insights/solutions, Eric
> >
> > dat <- read.table(textConnection("ISEG  IRCH  div  gw
> > 1  1  265  229
> > 1  2  260  298
> > 1  3  234  196
> > 54  1  432  485
> > 54  39  467  485
> > 54  40  468  468
> > 54  41  460  381
> > 54  42  489  502
> > 1  1  265  317
> > 1  2  276  225
> > 1  3  217  164
> > 54  1  430  489
> > 54  39  456  495
> > 54  40  507  607
> > 54  41  483  424
> > 54  42  457  404
> > 1  1  265  278
> > 1  2  287  370
> > 1  3  224  274
> > 54  1  412  585
> > 54  39  473  532
> > 54  40  502  595
> > 54  41  497  441
> > 54  42  447  467
> > 1  1  230  258
> > 1  2  251  152
> > 1  3  199  179
> > 54  1  412  415
> > 54  39  439  538
> > 54  40  474  486
> > 54  41  477  484
> > 54  42  413  346
> > 1  1  230  171
> > 1  2  262  171
> > 1  3  217  263
> > 54  1  432  485
> > 54  39  455  482
> > 54  40  493  419
> > 54  41  489  536
> > 54  42  431  504
> > 1  1  1002  1090
> > 1  2  1222  1178
> > 1  3  1198  1177
> > 54  1  1432  1485
> > 54  39  1876  1975
> > 54  40  1565  1646
> > 54  41  1455  1451
> > 54  42  1427  1524
> > 1  1  1002  968
> > 1  2  1246  1306
> > 1  3  1153  1158
> > 54  1  1532  1585
> > 54  39  1790  1889
> > 54  40  1490  1461
> > 54  41  1518  1536
> > 54  42  1486  1585
> > 1  1  1002  1081
> > 1  2  1229  1262
> > 1  3  1142  1241
> > 54  1  1632  1659
> > 54  39  1797  1730
> > 54  40  1517  1466
> > 54  41  1527  1589
> > 54  42  1514  1612"),header=TRUE)
> >
> > dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> > tmp <- diff(dat[dat$seq==1,]$div)!=0
> > dat$idx <- 0
> > dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> > dat$ts <- cumsum(dat$idx)
> > dat$iter <- ave(dat$seq, dat$ts,FUN=cumsum)
> > dat$ct <- seq(1:length(dat[,1]))
> >
> > timeStep <- unique(dat$ts)
> > SEG <- unique(dat$ISEG)
> > divChng <- data.frame(ts=NA, ISEG=NA, divChng=NA, gwChng=NA, iter=NA)
> >
> > #Can the following be rescripted for better harnessing R's processing
> > power?
> >
> > for (i in 1:length(timeStep)){
> >   for (j in 1:length(SEG)){
> >     datTS <- subset(dat,ts==timeStep[i] & ISEG==SEG[j] & IRCH==1)
> >     datGW <- subset(dat,ts==timeStep[i] & ISEG==SEG[j])
> >     grw <- aggregate(gw ~ iter, datGW, sum)
> >
> >     DC <- max(datTS$div)-min(datTS$div)
> >     GRW <- max(grw$gw) - min(grw$gw)
> >     divChng <- rbind(divChng,c(datTS$ts[1], SEG[j], DC, GRW,
> > max(datTS$iter)))
> >   }
> > }
> > divChng <- divChng[!is.na(divChng$ISEG),]
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]