[R] Using cumsum with 'group by' ?
Pete Brecknock
Peter.Brecknock at bp.com
Fri Nov 23 16:15:39 CET 2012
TheRealJimShady wrote
> Hi Peter,
>
> Yes, I did miss an e from the first 'not' in the brackets at the end
> of the message, sorry.
>
> Thanks for that code, but when I use it, it creates a new column
> called csum which simply contains the values of the variable x . i.e.
> it just duplicates the values from x into the new column. I guess
> this means that the grouping isn't occurring correctly?
>
> James
>
> On 23 November 2012 13:57, Peter Dalgaard-2 [via R]
> <
> ml-node+s789695n4650550h76 at .nabble
> > wrote:
>>
>> On Nov 23, 2012, at 12:04 , TheRealJimShady wrote:
>>
>>> Hi Arun & everyone,
>>>
>>> Thank you very much for your helpful suggestions. I've been working
>>> through them, but have realised that my data is a little more
>>> complicated than I said and that the solutions you've kindly provided
>>> don't work. The problem is that there is more than one day of data for
>>> each person. It looks like this:
>>>
>>> id x date
>>> 1 5 2012-06-05 12:01
>>> 1 10 2012-06-05 12:02
>>> 1 45 2012-06-05 12:03
>>> 2 5 2012-06-05 12:01
>>> 2 3 2012-06-05 12:03
>>> 2 2 2012-06-05 12:05
>>> 3 5 2012-06-05 12:03
>>> 3 5 2012-06-05 12:04
>>> 3 8 2012-06-05 12:05
>>> 1 5 2012-06-08 13:01
>>> 1 9 2012-06-08 13:02
>>> 1 3 2012-06-08 13:03
>>> 2 0 2012-06-08 13:15
>>> 2 1 2012-06-08 13:18
>>> 2 8 2012-06-08 13:20
>>> 2 4 2012-06-08 13:21
>>> 3 6 2012-06-08 13:15
>>> 3 2 2012-06-08 13:16
>>> 3 7 2012-06-08 13:17
>>> 3 2 2012-06-08 13:18
>>>
>>> So what I need to do is something like this (in pseudo code anyway):
>>>
>>> - Order the data by the id field and then the date field
>>> - add a new variable called cumsum
>>> - calculate this variable as the cumulative value of X, but grouping
>>> by the id and date (not date, not date and time).
>>
>> Did you miss an 'e' on the first 'not'?? Otherwise, I'm confused.
>>
>> Why do people always forget about ave()? I'd try
>>
>> newdata <- transform(mydata, csum=ave(x, id, as.Date(date), FUN=cumsum))
>>
>> You still need to sort, of course, at least by date (incl. time). Check
>> carefully whether time zone is an issue for as.Date --- you may need the
>> tz
>> argument.
>>
>> Also notice that I shy using an R function name as a variable name. This
>> is
>> mostly superstition these days, but better safe than sorry.
>>
>>
>>>
>>> Thank you
>>>
>>> James
>>>
>>>
>>>
>>>
>>>
>>> On 23 November 2012 03:54, arun kirshna [via R]
>>> <[hidden email]> wrote:
>>>> Hi,
>>>> No problem.
>>>> One more method if you wanted to try:
>>>> library(data.table)
>>>> dat2<-data.table(dat1)
>>>> dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
>>>> # id x time Cumsum
>>>> #1: 1 5 12:01 5
>>>> #2: 1 14 12:02 19
>>>> #3: 1 6 12:03 25
>>>> #4: 1 3 12:04 28
>>>> #5: 2 98 12:01 98
>>>> #6: 2 23 12:02 121
>>>> #7: 2 1 12:03 122
>>>> #8: 2 4 12:04 126
>>>> #9: 3 5 12:01 5
>>>> #10: 3 65 12:02 70
>>>> #11: 3 23 12:03 93
>>>> #12: 3 23 12:04 116
>>>>
>>>>
>>>> A.K.
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: TheRealJimShady <[hidden email]>
>>>> To: [hidden email]
>>>> Cc:
>>>> Sent: Thursday, November 22, 2012 12:27 PM
>>>> Subject: Re: [R] Using cumsum with 'group by' ?
>>>>
>>>> Thank you very much, I will try these tomorrow morning.
>>>>
>>>> On 22 November 2012 17:25, arun kirshna [via R]
>>>> <[hidden email]> wrote:
>>>>
>>>>> HI,
>>>>> You can do this in many ways:
>>>>> dat1<-read.table(text="
>>>>> id time x
>>>>> 1 12:01 5
>>>>> 1 12:02 14
>>>>> 1 12:03 6
>>>>> 1 12:04 3
>>>>> 2 12:01 98
>>>>> 2 12:02 23
>>>>> 2 12:03 1
>>>>> 2 12:04 4
>>>>> 3 12:01 5
>>>>> 3 12:02 65
>>>>> 3 12:03 23
>>>>> 3 12:04 23
>>>>> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>> dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
>>>>> #or
>>>>> unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
>>>>> # [1] 5 19 25 28 98 121 122 126 5 70 93 116
>>>>> #or
>>>>> library(plyr)
>>>>> ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
>>>>> # [1] 5 19 25 28 98 121 122 126 5 70 93 116
>>>>> head(dat1)
>>>>> # id time x Cumsum
>>>>> #1 1 12:01 5 5
>>>>> #2 1 12:02 14 19
>>>>> #3 1 12:03 6 25
>>>>> #4 1 12:04 3 28
>>>>> #5 2 12:01 98 98
>>>>> #6 2 12:02 23 121
>>>>> A.K.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> If you reply to this email, your message will be added to the
>>>>> discussion
>>>>> below:
>>>>>
>>>>>
>>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
>>>>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>>>>> NAML
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> ________________________________
>>>> If you reply to this email, your message will be added to the
>>>> discussion
>>>> below:
>>>>
>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
>>>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>>>> NAML
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html
>>
>>> Sent from the R help mailing list archive at Nabble.com.
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: [hidden email] Priv: [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650550.html
>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>> NAML
Peter Dalgaard's suggestion works for me ...
lines<-"id x date time
1 5 2012-06-05 12:01
1 10 2012-06-05 12:02
1 45 2012-06-05 12:03
2 5 2012-06-05 12:01
2 3 2012-06-05 12:03
2 2 2012-06-05 12:05
3 5 2012-06-05 12:03
3 5 2012-06-05 12:04
3 8 2012-06-05 12:05
1 5 2012-06-08 13:01
1 9 2012-06-08 13:02
1 3 2012-06-08 13:03
2 0 2012-06-08 13:15
2 1 2012-06-08 13:18
2 8 2012-06-08 13:20
2 4 2012-06-08 13:21
3 6 2012-06-08 13:15
3 2 2012-06-08 13:16
3 7 2012-06-08 13:17
3 2 2012-06-08 13:18
"
# read in data
dat1<-read.table(textConnection(lines), header=TRUE,stringsAsFactors=FALSE)
# build csum variable
newdata <- transform(dat1, csum=ave(x, id, as.Date(date), FUN=cumsum))
# order data (not really necessary)
newdata.ord <-newdata[order(dat1[,"id"],dat1[,"date"],dat1[,"time"]),]
Or have I misinterpreted your request?
HTH
Pete (B not D)
--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650556.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list