[R] numbering consecutive rows based on length criteria

Mon Mar 2 18:56:01 CET 2015

On Mar 2, 2015, at 11:43 AM, Morway, Eric <emorway at usgs.gov> wrote:
> 
> Using this dataset:
> 
> dat <- read.table(textConnection("day    noRes.Q    wRes.Q
> 1  237074.41 215409.41
> 2 2336240.20 164835.16
> 3   84855.42 357062.72
> 4   76993.48 386326.78
> 5   73489.47 307144.09
> 6   70246.96  75885.75
> 7   69630.09  74054.33
> 8   66714.78  70071.80
> 9  122296.90  66579.08
> 10   63502.71  65811.37
> 11   63401.84  64795.12
> 12   63387.84  64401.14
> 13   63186.10  64163.95
> 14   63160.74  63468.25
> 15   60471.15  60719.15
> 16   58235.63  57655.14
> 17   58089.73  58061.34
> 18   57846.39  57357.89
> 19   57839.42  56495.69
> 20   57740.06  56219.97
> 21   58068.57  55810.91
> 22   58358.34  56437.81
> 23   76284.90  73722.92
> 24  105138.31 100729.00
> 25  147203.03 178079.38
> 26  109996.02 111113.95
> 27   91424.20  87391.56
> 28   89065.91  87196.69
> 29   86628.74  84809.07
> 30   79357.60  77555.62"),header=T)
> 
> I'm attempting to generate a column that continuously numbers consecutive
> rows where wRes.Q is greater than noRes.Q.  To that end, I've come up with
> the following:
> 
> dat$flg <- dat$wRes.Q>dat$noRes.Q
> dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
> 
> The problem with dat$cnt is that it doesn't start over with 1 when a 'new'
> group of either true or false is encountered.  Thus, row 9's cnt value
> should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
> (the desired result is shown below)
> 
> In the larger dataset I'm working with (>6,000 rows), there are blocks of
> rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
> My goal is to plot these blocks of rows as polygons in a time series plot.
> If, for the small example provided, the number of consecutive rows with
> dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
> satisfying this criteria in this small example are rows 3-8 and 10-15), is
> there a way to add a column that uniquely numbers these blocks of rows? I'd
> like to end up with the following, which shows the correct "cnt" column and
> a column called "plygn" that is my ultimate goal:
> 
> dat
> # day    noRes.Q    wRes.Q   flg cnt  plygn
> #   1  237074.41 215409.41 FALSE   1     NA
> #   2 2336240.20 164835.16 FALSE   2     NA
> #   3   84855.42 357062.72  TRUE   1      1
> #   4   76993.48 386326.78  TRUE   2      1
> #   5   73489.47 307144.09  TRUE   3      1
> #   6   70246.96  75885.75  TRUE   4      1
> #   7   69630.09  74054.33  TRUE   5      1
> #   8   66714.78  70071.80  TRUE   6      1
> #   9  122296.90  66579.08 FALSE   1     NA
> #  10   63502.71  65811.37  TRUE   1      2
> #  11   63401.84  64795.12  TRUE   2      2
> #  12   63387.84  64401.14  TRUE   3      2
> #  13   63186.10  64163.95  TRUE   4      2
> #  14   63160.74  63468.25  TRUE   5      2
> #  15   60471.15  60719.15  TRUE   6      2
> #  16   58235.63  57655.14 FALSE   1     NA
> #  17   58089.73  58061.34 FALSE   2     NA
> #  18   57846.39  57357.89 FALSE   3     NA
> #  19   57839.42  56495.69 FALSE   4     NA
> #  20   57740.06  56219.97 FALSE   5     NA
> #  21   58068.57  55810.91 FALSE   6     NA
> #  22   58358.34  56437.81 FALSE   7     NA
> #  23   76284.90  73722.92 FALSE   8     NA
> #  24  105138.31 100729.00 FALSE   9     NA
> #  25  147203.03 178079.38  TRUE   1     NA
> #  26  109996.02 111113.95  TRUE   2     NA
> #  27   91424.20  87391.56 FALSE   1     NA
> #  28   89065.91  87196.69 FALSE   2     NA
> #  29   86628.74  84809.07 FALSE   3     NA
> #  30   79357.60  77555.62 FALSE   4     NA
> 
> Thanks, Eric

Hi,

See ?rle

> unlist(sapply(rle(with(dat, wRes.Q > noRes.Q))$lengths, seq))
 [1] 1 2 1 2 3 4 5 6 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 1 2 3 4

cbind() the result above to your data frame.

Regards,

Marc Schwartz