[R] applying cumsum within groups
David Winsemius
dwinsemius at comcast.net
Fri Apr 3 18:17:20 CEST 2015
On Apr 3, 2015, at 5:17 AM, Morway, Eric wrote:
> This small example will be applied to a problem with 1.4e6 lines of data.
> First, here is the dataset and a few lines of R script, followed by an
> explanation of what I'd like to get:
>
> dat <- read.table(textConnection("ISEG IRCH val
> 1 1 265
> 1 2 260
> 1 3 234
> 54 39 467
> 54 40 468
> 54 41 460
> 54 42 489
> 1 1 265
> 1 2 276
> 1 3 217
> 54 39 456
> 54 40 507
> 54 41 483
> 54 42 457
> 1 1 265
> 1 2 287
> 1 3 224
> 54 39 473
> 54 40 502
> 54 41 497
> 54 42 447
> 1 1 230
> 1 2 251
> 1 3 199
> 54 39 439
> 54 40 474
> 54 41 477
> 54 42 413
> 1 1 230
> 1 2 262
> 1 3 217
> 54 39 455
> 54 40 493
> 54 41 489
> 54 42 431
> 1 1 1002
> 1 2 1222
> 1 3 1198
> 54 39 1876
> 54 40 1565
> 54 41 1455
> 54 42 1427
> 1 1 1002
> 1 2 1246
> 1 3 1153
> 54 39 1813
> 54 40 1490
> 54 41 1518
> 54 42 1486
> 1 1 1002
> 1 2 1229
> 1 3 1142
> 54 39 1797
> 54 40 1517
> 54 41 1527
> 54 42 1514"),header=TRUE)
>
> dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> tmp <- diff(dat[dat$seq==1,]$val)!=0
> dat$idx <- 0
> dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> dat$ts <- cumsum(dat$idx)
>
> At this point, I'd like to add one more column called "iter" that counts up
> by 1 based on "seq", but within each "ts". So, the result would look like
> this (undoubtedly this is a simple problem with something like ddply, but
> I've been unable to construct the R for it):
> dat$iter2 <- ave(dat$seq, dat$ts,FUN=cumsum)
> dat
ISEG IRCH val seq idx ts iter iter2
1 1 1 265 1 1 1 1_1 1
2 1 2 260 0 0 1 1_1 1
3 1 3 234 0 0 1 1_1 1
4 54 39 467 0 0 1 1_1 1
5 54 40 468 0 0 1 1_1 1
6 54 41 460 0 0 1 1_1 1
7 54 42 489 0 0 1 1_1 1
8 1 1 265 1 0 1 1_2 2
9 1 2 276 0 0 1 1_2 2
10 1 3 217 0 0 1 1_2 2
11 54 39 456 0 0 1 1_2 2
12 54 40 507 0 0 1 1_2 2
13 54 41 483 0 0 1 1_2 2
14 54 42 457 0 0 1 1_2 2
15 1 1 265 1 0 1 1_3 3
16 1 2 287 0 0 1 1_3 3
17 1 3 224 0 0 1 1_3 3
18 54 39 473 0 0 1 1_3 3
19 54 40 502 0 0 1 1_3 3
20 54 41 497 0 0 1 1_3 3
21 54 42 447 0 0 1 1_3 3
22 1 1 230 1 1 2 2_4 1
23 1 2 251 0 0 2 2_4 1
snipped----->
--
David
>
> dat
> ISEG IRCH val seq idx ts iter
> 1 1 265 1 1 1 1
> 1 2 260 0 0 1 1
> 1 3 234 0 0 1 1
> 54 39 467 0 0 1 1
> 54 40 468 0 0 1 1
> 54 41 460 0 0 1 1
> 54 42 489 0 0 1 1
> 1 1 265 1 0 1 2
> 1 2 276 0 0 1 2
> 1 3 217 0 0 1 2
> 54 39 456 0 0 1 2
> 54 40 507 0 0 1 2
> 54 41 483 0 0 1 2
> 54 42 457 0 0 1 2
> 1 1 265 1 0 1 3
> 1 2 287 0 0 1 3
> 1 3 224 0 0 1 3
> 54 39 473 0 0 1 3
> 54 40 502 0 0 1 3
> 54 41 497 0 0 1 3
> 54 42 447 0 0 1 3
> 1 1 230 1 1 2 1
> 1 2 251 0 0 2 1
> 1 3 199 0 0 2 1
> 54 39 439 0 0 2 1
> 54 40 474 0 0 2 1
> 54 41 477 0 0 2 1
> 54 42 413 0 0 2 1
> 1 1 230 1 0 2 2
> 1 2 262 0 0 2 2
> 1 3 217 0 0 2 2
> 54 39 455 0 0 2 2
> 54 40 493 0 0 2 2
> 54 41 489 0 0 2 2
> 54 42 431 0 0 2 2
> 1 1 1002 1 1 3 1
> 1 2 1222 0 0 3 1
> 1 3 1198 0 0 3 1
> 54 39 1876 0 0 3 1
> 54 40 1565 0 0 3 1
> 54 41 1455 0 0 3 1
> 54 42 1427 0 0 3 1
> 1 1 1002 1 0 3 2
> 1 2 1246 0 0 3 2
> 1 3 1153 0 0 3 2
> 54 39 1813 0 0 3 2
> 54 40 1490 0 0 3 2
> 54 41 1518 0 0 3 2
> 54 42 1486 0 0 3 2
> 1 1 1002 1 0 3 3
> 1 2 1229 0 0 3 3
> 1 3 1142 0 0 3 3
> 54 39 1797 0 0 3 3
> 54 40 1517 0 0 3 3
> 54 41 1527 0 0 3 3
> 54 42 1514 0 0 3 3
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list