[R] subset only if f.e a column is successive for more than 3 values

Knut Krueger rhe|p @end|ng |rom krueger-|@m||y@de
Fri Sep 28 17:08:31 CEST 2018


Hi Jim,
thank's it is working with the given example,
but whats the difference when using

testdata=data.frame(TIME=c("17:11:20", "17:11:21", "17:11:22", 
"17:11:23", "17:11:24", "17:11:25", "17:11:26", "17:11:27", "17:11:28", 
"17:21:43",
                         "17:22:16", "17:22:19", "18:04:48", "18:04:49", 
"18:04:50", "18:04:51", "18:04:52", "19:50:09", "00:59:27", "00:59:28",
                         "00:59:29", "04:13:40", "04:13:43", "04:13:44"),

index=c(8960,8961,8962,8963,8964,8965,8966,8967,8968,9583,9616,9619,12168,12169,12170,12171,12172,18489
                               ,37047,37048,37049,48700,48701,48702))

seqindx<-rle(diff(testdata$index)==1)
runsel<-seqindx$lengths >= 3 & seqindx$values
# get the indices for the starts of the runs
starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
# and the ends
ends<-cumsum(seqindx$lengths)[runsel]+1

eval(parse(text=paste0("testdata[c(",paste(starts,ends,sep=":",collapse=","),"),]")))

the result (index)  is 
12168,9619,9616,9583,8968,12168,12169,12170,12171,12172


maybe the gaps between .. 8967,8968,9583,9616,9619,12168,12169 ..?

Regards Knut




More information about the R-help mailing list