[R] Conditional sum
mathijsdevaan
mathijsdevaan at gmail.com
Mon Feb 21 16:54:28 CET 2011
I am still struggling (I'm an R novice). Basically I just want to sum the
values per group if the year condition is met. I have the feeling that using
a loop would work, but I am not really familiar with loops. Something like
this?
for(DF$C in 1:length(DF$C))
{
DF<-which(DF$year<DF[i,"year"])
DF$D<-ave(DF$C,DF$group,FUN = function(x) sum(x))
}
It doesn't work and probably looks awful, so can someone point me in the
right direction? Thanks!
M
mathijsdevaan wrote:
>
>
> Dieter Menne wrote:
>>
>> In steps following the "thinking order". You could shorten this
>> considerably. I slightly changed you column names to more speakable ones.
>>
>> Dieter
>>
>>
>> DF = data.frame(read.table(textConnection(" group year C
>> 1 b1 1999 0.25
>> 2 c1 1999 0.25
>> 3 d1 1999 0.25
>> 4 a2 1999 0.25
>> 5 c2 1999 0.25
>> 6 d2 1999 0.25
>> 7 a3 1999 0.25
>> 8 b3 1999 0.25
>> 9 d3 1999 0.25
>> 10 a4 1999 0.25
>> 11 b4 1999 0.25
>> 12 c4 1999 0.25
>> 13 b1 2001 0.5
>> 14 a2 2001 0.5
>> 15 b1 2004 0.33
>> 16 c1 2004 0.33
>> 17 a2 2004 0.33
>> 18 c2 2004 0.33
>> 19 a3 2004 0.33
>> 20 b3 2004 0.33
>> 21 d2 1980 0.4
>> 22 a3 1980 0.4
>> 23 b4 1981 0.4
>> 24 c1 1981 0.4"),head=TRUE))
>>
>> by(DF,DF$group, FUN = function(x){
>> print(str(x))
>> })
>> # Looks like we should order...
>> # Other solutions are possible, but ordering all first might (not tested)
>> # be the most efficient way for large sets
>> DF = DF[order(DF$group,DF$year),]
>> # Let's try cumsum on each group
>> by(DF,DF$group, FUN = function(x){
>> cumsum(x$C)
>> })
>> # That's not exactly your defininition of "prior"
>> # correct for first value
>> by(DF,DF$group, FUN = function(x){
>> cumsum(x$C)-x$C
>> })
>> # Now the data are in right order, make vector of result
>> DF$D = unlist(by(DF,DF$group, FUN = function(x){
>> cumsum(x$C)
>> }))
>> # You could sort by row names now to restore the old order
>>
>>
>
> Thanks for the quick response, but it doesn't do the trick. There are two
> problems:
> 1. The ith value of the newly created variable DF$D also includes the ith
> value of DF$C (this problem is easily solved by DF$D = DF$D-DF$C.)
> 2. If group i in DF$group appears more than once in year t, the value of
> the second observation of that group exceeds (includes) the value of the
> first observation. Example (group b1 and a2 in 2001 are duplicated):
>
> DF = data.frame(read.table(textConnection(" group year C
> 1 b1 1999 0.25
> 2 c1 1999 0.25
> 3 d1 1999 0.25
> 4 a2 1999 0.25
> 5 c2 1999 0.25
> 6 d2 1999 0.25
> 7 a3 1999 0.25
> 8 b3 1999 0.25
> 9 d3 1999 0.25
> 10 a4 1999 0.25
> 11 b4 1999 0.25
> 12 c4 1999 0.25
> 13 b1 2001 0.5
> 14 a2 2001 0.5
> 15 b1 2004 0.33
> 16 c1 2004 0.33
> 17 a2 2004 0.33
> 18 c2 2004 0.33
> 19 a3 2004 0.33
> 20 b3 2004 0.33
> 21 d2 1980 0.4
> 22 a3 1980 0.4
> 23 b4 1981 0.4
> 24 c1 1981 0.4
> 25 b1 2001 0.5
> 26 a2 2001 0.5"),head=TRUE))
>
> by(DF,DF$group, FUN = function(x){print(str(x))})
>
> DF = DF[order(DF$group,DF$year),]
>
> by(DF,DF$group, FUN = function(x){cumsum(x$C)})
>
> by(DF,DF$group, FUN = function(x){cumsum(x$C)-x$C})
>
> DF$D = unlist(by(DF,DF$group, FUN = function(x){cumsum(x$C)}))
>
> DF$D = DF$D-DF$C
>
--
View this message in context: http://r.789695.n4.nabble.com/Conditional-sum-tp3315163p3317573.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list