[R] Conditional sum

mathijsdevaan mathijsdevaan at gmail.com
Sun Feb 20 18:53:55 CET 2011


Thanks for the quick response, but it doesn't do the trick. There are two
problems:
1. The ith value of the newly created variable DF$D also includes the ith
value of DF$C (this problem is easily solved by DF$D = DF$D-DF$C.)
2. If group i in DF$group appears more than once in year t, the value of the
second observation of that group exceeds (includes) the value of the first
observation. Example (group b1 and a2 in 2001 are duplicated):

DF = data.frame(read.table(textConnection("    group  year  C 
1 b1  1999  0.25 
2 c1  1999  0.25 
3 d1  1999  0.25 
4 a2  1999  0.25 
5 c2  1999  0.25 
6 d2  1999  0.25 
7 a3  1999  0.25 
8 b3  1999  0.25 
9 d3  1999  0.25 
10 a4  1999  0.25 
11 b4  1999  0.25 
12 c4  1999  0.25 
13 b1  2001  0.5 
14 a2  2001  0.5 
15 b1  2004  0.33 
16 c1  2004  0.33 
17 a2  2004  0.33 
18 c2  2004  0.33 
19 a3  2004  0.33 
20 b3  2004  0.33 
21 d2  1980  0.4 
22 a3  1980  0.4 
23 b4  1981  0.4 
24 c1  1981  0.4
25 b1  2001  0.5 
26 a2  2001  0.5"),head=TRUE)) 

by(DF,DF$group, FUN = function(x){print(str(x))}) 

DF = DF[order(DF$group,DF$year),] 

by(DF,DF$group, FUN = function(x){cumsum(x$C)}) 

by(DF,DF$group, FUN = function(x){cumsum(x$C)-x$C}) 

DF$D = unlist(by(DF,DF$group, FUN = function(x){cumsum(x$C)}))

DF$D = DF$D-DF$C
Dieter Menne wrote:
> 
> 
> mathijsdevaan wrote:
>> 
>> I have a DF like this:
>> 
>> DF = data.frame(read.table(textConnection("    A  B  C 
>> 1 b1  1999  0.25
>> 2 c1  1999  0.25
>> ..
>> For each factor in A I want to sum the values of C for all years(Bn)
>> prior to the current year(Bi):
>> 
>> 1 b1  1999  0.25  0
>> 2 c1  1999  0.25  0.4
>> 3 d1  1999  0.25  0
>> 
>> 
> In steps following the "thinking order". You could shorten this
> considerably. I slightly changed you column names to more speakable ones.
> 
> Dieter
> 
> 
> DF = data.frame(read.table(textConnection("    group  year  C
> 1 b1  1999  0.25
> 2 c1  1999  0.25
> 3 d1  1999  0.25
> 4 a2  1999  0.25
> 5 c2  1999  0.25
> 6 d2  1999  0.25
> 7 a3  1999  0.25
> 8 b3  1999  0.25
> 9 d3  1999  0.25
> 10 a4  1999  0.25
> 11 b4  1999  0.25
> 12 c4  1999  0.25
> 13 b1  2001  0.5
> 14 a2  2001  0.5
> 15 b1  2004  0.33
> 16 c1  2004  0.33
> 17 a2  2004  0.33
> 18 c2  2004  0.33
> 19 a3  2004  0.33
> 20 b3  2004  0.33
> 21 d2  1980  0.4
> 22 a3  1980  0.4
> 23 b4  1981  0.4
> 24 c1  1981  0.4"),head=TRUE))
> 
> by(DF,DF$group, FUN = function(x){
>   print(str(x))
> })
> # Looks like we should order...
> # Other solutions are possible, but ordering all first might (not tested)
> # be the most efficient way for large sets
> DF = DF[order(DF$group,DF$year),]
> # Let's try cumsum on each group
> by(DF,DF$group, FUN = function(x){
>   cumsum(x$C)
> })
> # That's not exactly your defininition of "prior"
> # correct for first value
> by(DF,DF$group, FUN = function(x){
>   cumsum(x$C)-x$C
> })
> # Now the data are  in right order, make vector of result
> DF$D = unlist(by(DF,DF$group, FUN = function(x){
>   cumsum(x$C)
> }))
> # You could sort by row names now to restore the old order
> 
> 

-- 
View this message in context: http://r.789695.n4.nabble.com/Conditional-sum-tp3315163p3315318.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list