[R] running count in data.frame
Mark Knecht
markknecht at gmail.com
Wed Jul 1 04:49:18 CEST 2009
Hi,
I need to keep a running count of events that have happened in my
data.frame. I found a document called usingR that had an example of
doing this for random coin flips and I tried to modify it. It seems to
sort of work in the beginning, but then it stops and I don't
understand why. I'm trying to duplicate essentially the Excel
capability of =SUM($A$1:$A(Row number))
The example looked like this:
x = cumsum(sample(c(-1,1),100,replace=T))
which does seem to work: (100 shortened to 20 for email)
> cumsum(sample(c(-1,1),20,replace=T))
[1] 1 0 1 0 1 2 3 4 5 4 3 4 5 6 7 6 5 4 5 6
> cumsum(sample(c(-1,1),20,replace=T))
[1] 1 2 1 2 1 2 3 2 3 4 5 6 7 8 7 8 7 8 9 8
> cumsum(sample(c(-1,1),20,replace=T))
[1] 1 0 1 0 1 0 1 2 3 4 5 6 7 8 7 8 7 6 7 8
> cumsum(sample(c(-1,1),20,replace=T))
[1] 1 0 1 0 1 0 -1 0 1 0 1 0 1 2 1 0 1 2 3 4
> cumsum(sample(c(-1,1),20,replace=T))
[1] 1 2 1 0 -1 0 -1 -2 -1 -2 -1 -2 -1 -2 -3 -2 -3 -4 -5 -6
However that example doesn't have to read from the data.frame so I
tried to leverage on some earlier help today but it isn't working for
me. The goal is the MyFrame$lc keeps a running total of events in the
MyFrame$l column, and likewise for $pc and $p. It seems that $lc
starts off OK until it gets to a 0 and then resets back to 0 which I
don't want. The $pc counter never seems to count. I also get a warning
message I don't understand so clearly I'm doing something very wrong
here:
> F1 <- RunningCount(F1)
Warning messages:
1: In MyFrame$pc[pos] <- cumsum(as.integer(pos)) :
number of items to replace is not a multiple of replacement length
2: In MyFrame$lc[pos] <- cumsum(as.integer(pos)) :
number of items to replace is not a multiple of replacement length
> F1
x y p l pc lc
1 1 -4 0 -4 0 1
2 2 -3 0 -3 0 2
3 3 -2 0 -2 0 3
4 4 -1 0 -1 0 4
5 5 0 0 0 0 0
6 6 1 1 0 0 0
7 7 2 2 0 0 0
8 8 3 3 0 0 0
9 9 4 4 0 0 0
10 10 5 5 0 0 0
>
I wanted $lc to go up to 4 and then hold 4 until the end. $pc should
have stays 0 until line 6 and then gone up to 5 at the end.
Any and all inputs appreciated on what I'm doing wrong.
Thanks,
Mark
AddCols = function (MyFrame) {
MyFrame$p<-0
MyFrame$l<-0
MyFrame$pc<-0
MyFrame$lc<-0
return(MyFrame)
}
BinPosNeg = function (MyFrame) {
## Positive y in p column, negative y in l column
pos <- MyFrame$y > 0
MyFrame$p[pos] <- MyFrame$y[pos]
MyFrame$l[!pos] <- MyFrame$y[!pos]
return(MyFrame)
}
RunningCount = function (MyFrame) {
## Running count of p & l events
pos <- (MyFrame$p > 0)
MyFrame$pc[pos] <- cumsum(as.integer(pos))
pos <- (MyFrame$l < 0)
MyFrame$lc[pos] <- cumsum(as.integer(pos))
return(MyFrame)
}
F1 <- data.frame(x=1:10, y=-4:5)
F1 <- AddCols(F1)
F1
F1 <- BinPosNeg(F1)
F1
F1 <- RunningCount(F1)
F1
More information about the R-help
mailing list