[R] Counting number of consecutive occurrences per rows
PIKAL Petr
petr.pikal at precheza.cz
Mon Apr 29 14:35:24 CEST 2013
Hi
rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T)))
test$res <- rep(rrr$lengths, rrr$lengths)
If you put it in function
fff<- function(x, limits=c(0,1,199,200)) {
rrr<-rle(as.numeric(cut(x, limits, include.lowest=T)))
res <- rep(rrr$lengths, rrr$lengths)
res
}
you can use split/lapply approach
test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), fff))
Beware of correct ordering of days in output. Without correct leveling of factor 0 precedes 1.
And for the last part probably aggregate can be the way.
> aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), include.lowest=T)), max)
Group.1 Group.2 x
1 14655 [0,1] 4
2 14655 (1,199] 3
3 14655 (199,200] 3
> aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), include.lowest=T)), min)
Group.1 Group.2 x
1 14655 [0,1] 4
2 14655 (1,199] 1
3 14655 (199,200] 2
Regards
Petr
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of zuzana zajkova
> Sent: Monday, April 29, 2013 12:45 PM
> To: r-help at r-project.org
> Subject: [R] Counting number of consecutive occurrences per rows
>
> Hi,
>
> I would appreciate if somebody could help me with following
> calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This
> is small example:
>
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L,
> 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
>
> Looks like this:
>
> > test
> jul time act day
> 510 14655 2010-02-15 18:25:54 130 1
> 512 14655 2010-02-15 18:35:54 23 1
> 514 14655 2010-02-15 18:45:54 45 1
> 516 14655 2010-02-15 18:55:54 200 1
> 518 14655 2010-02-15 19:05:54 200 1
> 520 14655 2010-02-15 19:15:54 200 1
> 522 14655 2010-02-15 19:25:54 199 1
> 524 14655 2010-02-15 19:35:54 150 1
> 526 14655 2010-02-15 19:45:54 0 1
> 528 14655 2010-02-15 19:55:54 0 1
> 530 14655 2010-02-15 20:05:54 0 0
> 532 14655 2010-02-15 20:15:54 0 0
> 534 14655 2010-02-15 20:25:54 34 0
> 536 14655 2010-02-15 20:35:54 200 0
> 538 14655 2010-02-15 20:45:54 200 0
> 540 14655 2010-02-15 20:55:54 145 0
>
>
> What I would like to calculate is the number of consecutive occurrences
> of values 200, 0 and together values from 1 til 199 (in fact the
> values that differ from 200 and 0) in column "act".
>
> I would like to get something like this (result$res)
>
> > result
> jul time act day res res2
> 510 14655 2010-02-15 18:25:54 130 1 3 3
> 512 14655 2010-02-15 18:35:54 23 1 3 3
> 514 14655 2010-02-15 18:45:54 45 1 3 3
> 516 14655 2010-02-15 18:55:54 200 1 3 3
> 518 14655 2010-02-15 19:05:54 200 1 3 3
> 520 14655 2010-02-15 19:15:54 200 1 3 3
> 522 14655 2010-02-15 19:25:54 199 1 2 2
> 524 14655 2010-02-15 19:35:54 150 1 2 2
> 526 14655 2010-02-15 19:45:54 0 1 4 2
> 528 14655 2010-02-15 19:55:54 0 1 4 2
> 530 14655 2010-02-15 20:05:54 0 0 4 2
> 532 14655 2010-02-15 20:15:54 0 0 4 2
> 534 14655 2010-02-15 20:25:54 34 0 1 1
> 536 14655 2010-02-15 20:35:54 200 0 2 2
> 538 14655 2010-02-15 20:45:54 200 0 2 2
> 540 14655 2010-02-15 20:55:54 145 0 1 1
>
> And if possible, distinguish among day==1 and day==0 (see the "act"
> values of 0 for example), results as in result$res2.
>
> After it I would like to make a resume table per days (jul):
> where maxres is max(result$res) for the "act" value where minres is
> min(result$res) for the "act" value where sumres is sum(result$res) for
> the "act" value (for example, if the 200 value ocurrs in different
> times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumres
> would be 3+5+1+6+7= 22)
>
> something like this (this are made up numbers):
>
> jul act maxres minres sumres
> 14655 0 4 1 25
> 14655 200 3 2 48
> 14655 1-199 3 1 71
> 14656 0 8 2 38
> 14656 200 15 3 60
> 14656 1-199 11 4 46
> ...
> (theoretically the sum of sumres per day(jul) should be 144)
>
> ____________
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) ____________
>
> I hope my explanation is sufficient. I appreciate any hint.
> Thank you,
>
> Zuzana
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list