[R] nested for loops too slow
Thierry Onkelinx
thierry.onkelinx at inbo.be
Sun Apr 12 22:48:11 CEST 2015
You don't need loops at all.
grw <- aggregate(gw ~ ts + ISEG + iter, data = dat, FUN = sum)
GRW <- aggregate(gw ~ ts + ISEG, data = grw, FUN = function(x){max(x) -
min(x)})
DC <- aggregate(div ~ ts + ISEG, data = subset(dat, IRCH == 1), FUN =
function(x){max(x) - min(x)})
iter <- aggregate(iter ~ ts + ISEG, data = subset(dat, IRCH == 1), FUN
= max)
tmp <- merge(DC, iter)
merge(tmp, GRW)
another option is to use the plyr package
library(plyr)
merge(
ddply(
subset(dat, IRCH == 1),
c("ts", "ISEG"),
summarize,
divChng = max(div) - min(div),
max.iter = max(iter)
),
ddply(
dat,
c("ts", "ISEG"),
summarize,
gwChng = diff(range(ave(gw, iter, FUN = sum)))
)
)
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2015-04-12 15:47 GMT+02:00 Morway, Eric <emorway op usgs.gov>:
> The small example below works lighting-fast; however, when I run the same
> script on my real problem, a 1Gb text file, the for loops have been running
> for over 24 hrs and I have no idea if the processing is 10% done or 90%
> done. I have not been able to figure out a betteR way to code up the
> material within the for loops at the end of the example below. The
> contents of divChng, the final product, are exactly what I'm after, but I
> need help formulating more efficient R script, I've got two more 1Gb files
> to process after the current one finishes, whenever that is...
>
> I appreciate any insights/solutions, Eric
>
> dat <- read.table(textConnection("ISEG IRCH div gw
> 1 1 265 229
> 1 2 260 298
> 1 3 234 196
> 54 1 432 485
> 54 39 467 485
> 54 40 468 468
> 54 41 460 381
> 54 42 489 502
> 1 1 265 317
> 1 2 276 225
> 1 3 217 164
> 54 1 430 489
> 54 39 456 495
> 54 40 507 607
> 54 41 483 424
> 54 42 457 404
> 1 1 265 278
> 1 2 287 370
> 1 3 224 274
> 54 1 412 585
> 54 39 473 532
> 54 40 502 595
> 54 41 497 441
> 54 42 447 467
> 1 1 230 258
> 1 2 251 152
> 1 3 199 179
> 54 1 412 415
> 54 39 439 538
> 54 40 474 486
> 54 41 477 484
> 54 42 413 346
> 1 1 230 171
> 1 2 262 171
> 1 3 217 263
> 54 1 432 485
> 54 39 455 482
> 54 40 493 419
> 54 41 489 536
> 54 42 431 504
> 1 1 1002 1090
> 1 2 1222 1178
> 1 3 1198 1177
> 54 1 1432 1485
> 54 39 1876 1975
> 54 40 1565 1646
> 54 41 1455 1451
> 54 42 1427 1524
> 1 1 1002 968
> 1 2 1246 1306
> 1 3 1153 1158
> 54 1 1532 1585
> 54 39 1790 1889
> 54 40 1490 1461
> 54 41 1518 1536
> 54 42 1486 1585
> 1 1 1002 1081
> 1 2 1229 1262
> 1 3 1142 1241
> 54 1 1632 1659
> 54 39 1797 1730
> 54 40 1517 1466
> 54 41 1527 1589
> 54 42 1514 1612"),header=TRUE)
>
> dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> tmp <- diff(dat[dat$seq==1,]$div)!=0
> dat$idx <- 0
> dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> dat$ts <- cumsum(dat$idx)
> dat$iter <- ave(dat$seq, dat$ts,FUN=cumsum)
> dat$ct <- seq(1:length(dat[,1]))
>
> timeStep <- unique(dat$ts)
> SEG <- unique(dat$ISEG)
> divChng <- data.frame(ts=NA, ISEG=NA, divChng=NA, gwChng=NA, iter=NA)
>
> #Can the following be rescripted for better harnessing R's processing
> power?
>
> for (i in 1:length(timeStep)){
> for (j in 1:length(SEG)){
> datTS <- subset(dat,ts==timeStep[i] & ISEG==SEG[j] & IRCH==1)
> datGW <- subset(dat,ts==timeStep[i] & ISEG==SEG[j])
> grw <- aggregate(gw ~ iter, datGW, sum)
>
> DC <- max(datTS$div)-min(datTS$div)
> GRW <- max(grw$gw) - min(grw$gw)
> divChng <- rbind(divChng,c(datTS$ts[1], SEG[j], DC, GRW,
> max(datTS$iter)))
> }
> }
> divChng <- divChng[!is.na(divChng$ISEG),]
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list