[R] applying cumsum within groups

peter dalgaard pdalgd at gmail.com
Fri Apr 3 19:12:39 CEST 2015


ave() is your friend (unfortunately named as it may be):

> ave(dat$seq, dat$ts, FUN=cumsum)
 [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1
[39] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3


> On 03 Apr 2015, at 14:17 , Morway, Eric <emorway at usgs.gov> wrote:
> 
> This small example will be applied to a problem with 1.4e6 lines of data.
> First, here is the dataset and a few lines of R script, followed by an
> explanation of what I'd like to get:
> 
> dat <- read.table(textConnection("ISEG  IRCH  val
> 1    1   265
> 1    2   260
> 1    3   234
> 54   39   467
> 54   40   468
> 54   41   460
> 54   42   489
> 1    1   265
> 1    2   276
> 1    3   217
> 54   39   456
> 54   40   507
> 54   41   483
> 54   42   457
> 1    1   265
> 1    2   287
> 1    3   224
> 54   39   473
> 54   40   502
> 54   41   497
> 54   42   447
> 1    1   230
> 1    2   251
> 1    3   199
> 54   39   439
> 54   40   474
> 54   41   477
> 54   42   413
> 1    1   230
> 1    2   262
> 1    3   217
> 54   39   455
> 54   40   493
> 54   41   489
> 54   42   431
> 1    1   1002
> 1    2   1222
> 1    3   1198
> 54   39   1876
> 54   40   1565
> 54   41   1455
> 54   42   1427
> 1    1   1002
> 1    2   1246
> 1    3   1153
> 54   39   1813
> 54   40   1490
> 54   41   1518
> 54   42   1486
> 1    1   1002
> 1    2   1229
> 1    3   1142
> 54   39   1797
> 54   40   1517
> 54   41   1527
> 54   42   1514"),header=TRUE)
> 
> dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> tmp <- diff(dat[dat$seq==1,]$val)!=0
> dat$idx <- 0
> dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> dat$ts <- cumsum(dat$idx)
> 
> At this point, I'd like to add one more column called "iter" that counts up
> by 1 based on "seq", but within each "ts".  So, the result would look like
> this (undoubtedly this is a simple problem with something like ddply, but
> I've been unable to construct the R for it):
> 
> dat
> ISEG IRCH  val seq idx ts iter
>    1    1  265   1   1  1    1
>    1    2  260   0   0  1    1
>    1    3  234   0   0  1    1
>   54   39  467   0   0  1    1
>   54   40  468   0   0  1    1
>   54   41  460   0   0  1    1
>   54   42  489   0   0  1    1
>    1    1  265   1   0  1    2
>    1    2  276   0   0  1    2
>    1    3  217   0   0  1    2
>   54   39  456   0   0  1    2
>   54   40  507   0   0  1    2
>   54   41  483   0   0  1    2
>   54   42  457   0   0  1    2
>    1    1  265   1   0  1    3
>    1    2  287   0   0  1    3
>    1    3  224   0   0  1    3
>   54   39  473   0   0  1    3
>   54   40  502   0   0  1    3
>   54   41  497   0   0  1    3
>   54   42  447   0   0  1    3
>    1    1  230   1   1  2    1
>    1    2  251   0   0  2    1
>    1    3  199   0   0  2    1
>   54   39  439   0   0  2    1
>   54   40  474   0   0  2    1
>   54   41  477   0   0  2    1
>   54   42  413   0   0  2    1
>    1    1  230   1   0  2    2
>    1    2  262   0   0  2    2
>    1    3  217   0   0  2    2
>   54   39  455   0   0  2    2
>   54   40  493   0   0  2    2
>   54   41  489   0   0  2    2
>   54   42  431   0   0  2    2
>    1    1 1002   1   1  3    1
>    1    2 1222   0   0  3    1
>    1    3 1198   0   0  3    1
>   54   39 1876   0   0  3    1
>   54   40 1565   0   0  3    1
>   54   41 1455   0   0  3    1
>   54   42 1427   0   0  3    1
>    1    1 1002   1   0  3    2
>    1    2 1246   0   0  3    2
>    1    3 1153   0   0  3    2
>   54   39 1813   0   0  3    2
>   54   40 1490   0   0  3    2
>   54   41 1518   0   0  3    2
>   54   42 1486   0   0  3    2
>    1    1 1002   1   0  3    3
>    1    2 1229   0   0  3    3
>    1    3 1142   0   0  3    3
>   54   39 1797   0   0  3    3
>   54   40 1517   0   0  3    3
>   54   41 1527   0   0  3    3
>   54   42 1514   0   0  3    3
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list