[R] numbering consecutive rows based on length criteria
Thierry Onkelinx
thierry.onkelinx at inbo.be
Mon Mar 2 19:02:07 CET 2015
Dear Eric,
Here is a solution using the plyr package.
library(plyr)
dat$flg <- dat$wRes.Q>dat$noRes.Q
dat$group <- cumsum(c(0, abs(diff(dat$flg))))
ddply(dat, "group", function(x){
if(x$flg[1] && nrow(x) >= 5){
x$plygn <- seq_along(x$group)
} else {
x$plygn <- NA
}
x
})
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2015-03-02 18:43 GMT+01:00 Morway, Eric <emorway op usgs.gov>:
> Using this dataset:
>
> dat <- read.table(textConnection("day noRes.Q wRes.Q
> 1 237074.41 215409.41
> 2 2336240.20 164835.16
> 3 84855.42 357062.72
> 4 76993.48 386326.78
> 5 73489.47 307144.09
> 6 70246.96 75885.75
> 7 69630.09 74054.33
> 8 66714.78 70071.80
> 9 122296.90 66579.08
> 10 63502.71 65811.37
> 11 63401.84 64795.12
> 12 63387.84 64401.14
> 13 63186.10 64163.95
> 14 63160.74 63468.25
> 15 60471.15 60719.15
> 16 58235.63 57655.14
> 17 58089.73 58061.34
> 18 57846.39 57357.89
> 19 57839.42 56495.69
> 20 57740.06 56219.97
> 21 58068.57 55810.91
> 22 58358.34 56437.81
> 23 76284.90 73722.92
> 24 105138.31 100729.00
> 25 147203.03 178079.38
> 26 109996.02 111113.95
> 27 91424.20 87391.56
> 28 89065.91 87196.69
> 29 86628.74 84809.07
> 30 79357.60 77555.62"),header=T)
>
> I'm attempting to generate a column that continuously numbers consecutive
> rows where wRes.Q is greater than noRes.Q. To that end, I've come up with
> the following:
>
> dat$flg <- dat$wRes.Q>dat$noRes.Q
> dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
>
> The problem with dat$cnt is that it doesn't start over with 1 when a 'new'
> group of either true or false is encountered. Thus, row 9's cnt value
> should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
> (the desired result is shown below)
>
> In the larger dataset I'm working with (>6,000 rows), there are blocks of
> rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
> My goal is to plot these blocks of rows as polygons in a time series plot.
> If, for the small example provided, the number of consecutive rows with
> dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
> satisfying this criteria in this small example are rows 3-8 and 10-15), is
> there a way to add a column that uniquely numbers these blocks of rows? I'd
> like to end up with the following, which shows the correct "cnt" column and
> a column called "plygn" that is my ultimate goal:
>
> dat
> # day noRes.Q wRes.Q flg cnt plygn
> # 1 237074.41 215409.41 FALSE 1 NA
> # 2 2336240.20 164835.16 FALSE 2 NA
> # 3 84855.42 357062.72 TRUE 1 1
> # 4 76993.48 386326.78 TRUE 2 1
> # 5 73489.47 307144.09 TRUE 3 1
> # 6 70246.96 75885.75 TRUE 4 1
> # 7 69630.09 74054.33 TRUE 5 1
> # 8 66714.78 70071.80 TRUE 6 1
> # 9 122296.90 66579.08 FALSE 1 NA
> # 10 63502.71 65811.37 TRUE 1 2
> # 11 63401.84 64795.12 TRUE 2 2
> # 12 63387.84 64401.14 TRUE 3 2
> # 13 63186.10 64163.95 TRUE 4 2
> # 14 63160.74 63468.25 TRUE 5 2
> # 15 60471.15 60719.15 TRUE 6 2
> # 16 58235.63 57655.14 FALSE 1 NA
> # 17 58089.73 58061.34 FALSE 2 NA
> # 18 57846.39 57357.89 FALSE 3 NA
> # 19 57839.42 56495.69 FALSE 4 NA
> # 20 57740.06 56219.97 FALSE 5 NA
> # 21 58068.57 55810.91 FALSE 6 NA
> # 22 58358.34 56437.81 FALSE 7 NA
> # 23 76284.90 73722.92 FALSE 8 NA
> # 24 105138.31 100729.00 FALSE 9 NA
> # 25 147203.03 178079.38 TRUE 1 NA
> # 26 109996.02 111113.95 TRUE 2 NA
> # 27 91424.20 87391.56 FALSE 1 NA
> # 28 89065.91 87196.69 FALSE 2 NA
> # 29 86628.74 84809.07 FALSE 3 NA
> # 30 79357.60 77555.62 FALSE 4 NA
>
> Thanks, Eric
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list