[R] applying cumsum within groups
Morway, Eric
emorway at usgs.gov
Fri Apr 3 14:17:46 CEST 2015
This small example will be applied to a problem with 1.4e6 lines of data.
First, here is the dataset and a few lines of R script, followed by an
explanation of what I'd like to get:
dat <- read.table(textConnection("ISEG IRCH val
1 1 265
1 2 260
1 3 234
54 39 467
54 40 468
54 41 460
54 42 489
1 1 265
1 2 276
1 3 217
54 39 456
54 40 507
54 41 483
54 42 457
1 1 265
1 2 287
1 3 224
54 39 473
54 40 502
54 41 497
54 42 447
1 1 230
1 2 251
1 3 199
54 39 439
54 40 474
54 41 477
54 42 413
1 1 230
1 2 262
1 3 217
54 39 455
54 40 493
54 41 489
54 42 431
1 1 1002
1 2 1222
1 3 1198
54 39 1876
54 40 1565
54 41 1455
54 42 1427
1 1 1002
1 2 1246
1 3 1153
54 39 1813
54 40 1490
54 41 1518
54 42 1486
1 1 1002
1 2 1229
1 3 1142
54 39 1797
54 40 1517
54 41 1527
54 42 1514"),header=TRUE)
dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
tmp <- diff(dat[dat$seq==1,]$val)!=0
dat$idx <- 0
dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
dat$ts <- cumsum(dat$idx)
At this point, I'd like to add one more column called "iter" that counts up
by 1 based on "seq", but within each "ts". So, the result would look like
this (undoubtedly this is a simple problem with something like ddply, but
I've been unable to construct the R for it):
dat
ISEG IRCH val seq idx ts iter
1 1 265 1 1 1 1
1 2 260 0 0 1 1
1 3 234 0 0 1 1
54 39 467 0 0 1 1
54 40 468 0 0 1 1
54 41 460 0 0 1 1
54 42 489 0 0 1 1
1 1 265 1 0 1 2
1 2 276 0 0 1 2
1 3 217 0 0 1 2
54 39 456 0 0 1 2
54 40 507 0 0 1 2
54 41 483 0 0 1 2
54 42 457 0 0 1 2
1 1 265 1 0 1 3
1 2 287 0 0 1 3
1 3 224 0 0 1 3
54 39 473 0 0 1 3
54 40 502 0 0 1 3
54 41 497 0 0 1 3
54 42 447 0 0 1 3
1 1 230 1 1 2 1
1 2 251 0 0 2 1
1 3 199 0 0 2 1
54 39 439 0 0 2 1
54 40 474 0 0 2 1
54 41 477 0 0 2 1
54 42 413 0 0 2 1
1 1 230 1 0 2 2
1 2 262 0 0 2 2
1 3 217 0 0 2 2
54 39 455 0 0 2 2
54 40 493 0 0 2 2
54 41 489 0 0 2 2
54 42 431 0 0 2 2
1 1 1002 1 1 3 1
1 2 1222 0 0 3 1
1 3 1198 0 0 3 1
54 39 1876 0 0 3 1
54 40 1565 0 0 3 1
54 41 1455 0 0 3 1
54 42 1427 0 0 3 1
1 1 1002 1 0 3 2
1 2 1246 0 0 3 2
1 3 1153 0 0 3 2
54 39 1813 0 0 3 2
54 40 1490 0 0 3 2
54 41 1518 0 0 3 2
54 42 1486 0 0 3 2
1 1 1002 1 0 3 3
1 2 1229 0 0 3 3
1 3 1142 0 0 3 3
54 39 1797 0 0 3 3
54 40 1517 0 0 3 3
54 41 1527 0 0 3 3
54 42 1514 0 0 3 3
[[alternative HTML version deleted]]
More information about the R-help
mailing list