[R] subset only if f.e a column is successive for more than 3 values

William Dunlap wdun|@p @end|ng |rom t|bco@com
Fri Sep 28 17:22:59 CEST 2018


Do you also want lines 38 and 39 (in addition to 40:44), or do I
misunderstand your problem?

When you deal with runs of data, think of the rle (run-length encoding)
function.  E.g. here is
a barely tested function to find runs of a given minimum length and a given
difference between
successive values.  It also returns a 'runNumber' so you can split the
result into runs.

findRuns <- function(x, minRunLength=3, difference=1) {
     # for integral x, find runs of length at least 'minRunLength'
     # with 'difference' between succesive values
     d <- diff(x)
     dRle <- rle(d)
     w <- rep(dRle$lengths>=minRunLength-1 & dRle$values==difference,
dRle$lengths)
     values <- x[c(FALSE,w) | c(w,FALSE)]
     runNumber <- cumsum(c(TRUE, diff(values)!=difference))
     data.frame(values=values, runNumber=runNumber)
}

> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20))
  values runNumber
1      1         1
2      2         1
3      3         1
4     17         2
5     18         2
6     19         2
7     20         2
> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), minRunLength=4)
  values runNumber
1     17         1
2     18         1
3     19         1
4     20         1
> findRuns(c(10,8,6,4,1,2,3,20,17,18,19,20), difference=-2)
  values runNumber
1     10         1
2      8         1
3      6         1
4      4         1


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Sep 27, 2018 at 7:48 AM, Knut Krueger <rhelp using krueger-family.de>
wrote:

> Hi to all
>
> I need a subset for values if there are f.e 3 values successive in a
> column of a Data Frame:
> Example from the subset help page:
>
> subset(airquality, Temp > 80, select = c(Ozone, Temp))
> 29     45   81
> 35     NA   84
> 36     NA   85
> 38     29   82
> 39     NA   87
> 40     71   90
> 41     39   87
> 42     NA   93
> 43     NA   92
> 44     23   82
> .....
>
> I would like to get only
>
> ...
> 40     71   90
> 41     39   87
> 42     NA   93
> 43     NA   92
> 44     23   82
> ....
>
> because the left column is ascending more than f.e three times without gap
>
> Any hints for a package or do I need to build a own function?
>
> Kind Regards Knut
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list