[R] applying cumsum within groups

Morway, Eric emorway at usgs.gov
Fri Apr 3 14:17:46 CEST 2015


This small example will be applied to a problem with 1.4e6 lines of data.
First, here is the dataset and a few lines of R script, followed by an
explanation of what I'd like to get:

dat <- read.table(textConnection("ISEG  IRCH  val
 1    1   265
 1    2   260
 1    3   234
54   39   467
54   40   468
54   41   460
54   42   489
 1    1   265
 1    2   276
 1    3   217
54   39   456
54   40   507
54   41   483
54   42   457
 1    1   265
 1    2   287
 1    3   224
54   39   473
54   40   502
54   41   497
54   42   447
 1    1   230
 1    2   251
 1    3   199
54   39   439
54   40   474
54   41   477
54   42   413
 1    1   230
 1    2   262
 1    3   217
54   39   455
54   40   493
54   41   489
54   42   431
 1    1   1002
 1    2   1222
 1    3   1198
54   39   1876
54   40   1565
54   41   1455
54   42   1427
 1    1   1002
 1    2   1246
 1    3   1153
54   39   1813
54   40   1490
54   41   1518
54   42   1486
 1    1   1002
 1    2   1229
 1    3   1142
54   39   1797
54   40   1517
54   41   1527
54   42   1514"),header=TRUE)

dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
tmp <- diff(dat[dat$seq==1,]$val)!=0
dat$idx <- 0
dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
dat$ts <- cumsum(dat$idx)

At this point, I'd like to add one more column called "iter" that counts up
by 1 based on "seq", but within each "ts".  So, the result would look like
this (undoubtedly this is a simple problem with something like ddply, but
I've been unable to construct the R for it):

dat
 ISEG IRCH  val seq idx ts iter
    1    1  265   1   1  1    1
    1    2  260   0   0  1    1
    1    3  234   0   0  1    1
   54   39  467   0   0  1    1
   54   40  468   0   0  1    1
   54   41  460   0   0  1    1
   54   42  489   0   0  1    1
    1    1  265   1   0  1    2
    1    2  276   0   0  1    2
    1    3  217   0   0  1    2
   54   39  456   0   0  1    2
   54   40  507   0   0  1    2
   54   41  483   0   0  1    2
   54   42  457   0   0  1    2
    1    1  265   1   0  1    3
    1    2  287   0   0  1    3
    1    3  224   0   0  1    3
   54   39  473   0   0  1    3
   54   40  502   0   0  1    3
   54   41  497   0   0  1    3
   54   42  447   0   0  1    3
    1    1  230   1   1  2    1
    1    2  251   0   0  2    1
    1    3  199   0   0  2    1
   54   39  439   0   0  2    1
   54   40  474   0   0  2    1
   54   41  477   0   0  2    1
   54   42  413   0   0  2    1
    1    1  230   1   0  2    2
    1    2  262   0   0  2    2
    1    3  217   0   0  2    2
   54   39  455   0   0  2    2
   54   40  493   0   0  2    2
   54   41  489   0   0  2    2
   54   42  431   0   0  2    2
    1    1 1002   1   1  3    1
    1    2 1222   0   0  3    1
    1    3 1198   0   0  3    1
   54   39 1876   0   0  3    1
   54   40 1565   0   0  3    1
   54   41 1455   0   0  3    1
   54   42 1427   0   0  3    1
    1    1 1002   1   0  3    2
    1    2 1246   0   0  3    2
    1    3 1153   0   0  3    2
   54   39 1813   0   0  3    2
   54   40 1490   0   0  3    2
   54   41 1518   0   0  3    2
   54   42 1486   0   0  3    2
    1    1 1002   1   0  3    3
    1    2 1229   0   0  3    3
    1    3 1142   0   0  3    3
   54   39 1797   0   0  3    3
   54   40 1517   0   0  3    3
   54   41 1527   0   0  3    3
   54   42 1514   0   0  3    3

	[[alternative HTML version deleted]]



More information about the R-help mailing list