[R] numbering consecutive rows based on length criteria
Morway, Eric
emorway at usgs.gov
Mon Mar 2 18:43:55 CET 2015
Using this dataset:
dat <- read.table(textConnection("day noRes.Q wRes.Q
1 237074.41 215409.41
2 2336240.20 164835.16
3 84855.42 357062.72
4 76993.48 386326.78
5 73489.47 307144.09
6 70246.96 75885.75
7 69630.09 74054.33
8 66714.78 70071.80
9 122296.90 66579.08
10 63502.71 65811.37
11 63401.84 64795.12
12 63387.84 64401.14
13 63186.10 64163.95
14 63160.74 63468.25
15 60471.15 60719.15
16 58235.63 57655.14
17 58089.73 58061.34
18 57846.39 57357.89
19 57839.42 56495.69
20 57740.06 56219.97
21 58068.57 55810.91
22 58358.34 56437.81
23 76284.90 73722.92
24 105138.31 100729.00
25 147203.03 178079.38
26 109996.02 111113.95
27 91424.20 87391.56
28 89065.91 87196.69
29 86628.74 84809.07
30 79357.60 77555.62"),header=T)
I'm attempting to generate a column that continuously numbers consecutive
rows where wRes.Q is greater than noRes.Q. To that end, I've come up with
the following:
dat$flg <- dat$wRes.Q>dat$noRes.Q
dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
The problem with dat$cnt is that it doesn't start over with 1 when a 'new'
group of either true or false is encountered. Thus, row 9's cnt value
should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
(the desired result is shown below)
In the larger dataset I'm working with (>6,000 rows), there are blocks of
rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
My goal is to plot these blocks of rows as polygons in a time series plot.
If, for the small example provided, the number of consecutive rows with
dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
satisfying this criteria in this small example are rows 3-8 and 10-15), is
there a way to add a column that uniquely numbers these blocks of rows? I'd
like to end up with the following, which shows the correct "cnt" column and
a column called "plygn" that is my ultimate goal:
dat
# day noRes.Q wRes.Q flg cnt plygn
# 1 237074.41 215409.41 FALSE 1 NA
# 2 2336240.20 164835.16 FALSE 2 NA
# 3 84855.42 357062.72 TRUE 1 1
# 4 76993.48 386326.78 TRUE 2 1
# 5 73489.47 307144.09 TRUE 3 1
# 6 70246.96 75885.75 TRUE 4 1
# 7 69630.09 74054.33 TRUE 5 1
# 8 66714.78 70071.80 TRUE 6 1
# 9 122296.90 66579.08 FALSE 1 NA
# 10 63502.71 65811.37 TRUE 1 2
# 11 63401.84 64795.12 TRUE 2 2
# 12 63387.84 64401.14 TRUE 3 2
# 13 63186.10 64163.95 TRUE 4 2
# 14 63160.74 63468.25 TRUE 5 2
# 15 60471.15 60719.15 TRUE 6 2
# 16 58235.63 57655.14 FALSE 1 NA
# 17 58089.73 58061.34 FALSE 2 NA
# 18 57846.39 57357.89 FALSE 3 NA
# 19 57839.42 56495.69 FALSE 4 NA
# 20 57740.06 56219.97 FALSE 5 NA
# 21 58068.57 55810.91 FALSE 6 NA
# 22 58358.34 56437.81 FALSE 7 NA
# 23 76284.90 73722.92 FALSE 8 NA
# 24 105138.31 100729.00 FALSE 9 NA
# 25 147203.03 178079.38 TRUE 1 NA
# 26 109996.02 111113.95 TRUE 2 NA
# 27 91424.20 87391.56 FALSE 1 NA
# 28 89065.91 87196.69 FALSE 2 NA
# 29 86628.74 84809.07 FALSE 3 NA
# 30 79357.60 77555.62 FALSE 4 NA
Thanks, Eric
[[alternative HTML version deleted]]
More information about the R-help
mailing list