[R] summing values by week - based on daily dates - but with some dates missing
jim holtman
jholtman at gmail.com
Wed Mar 30 22:34:40 CEST 2011
Here is a way of taking a sequence of dates and breaking them into
weeks that start on Monday (you can change it) and it will span across
years:
> # create a couple of years of dates
> x <- seq(as.Date('2009-7-8'), as.Date('2012-3-7'), by = '1 day')
> # determine when the Monday for the first date is and use this
> # to subtract from the other dates to get the appropriate week
> # which will span years.
> firstMonday <- x[1L] - as.numeric(format(x[1L], "%w")) + 1
> weekSeq <- as.numeric(x - firstMonday) %/% 7
> # so you have the following vector to split the dates into weeks
> head(weekSeq, 30)
[1] 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4
>
On Wed, Mar 30, 2011 at 3:35 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Henrique, this is great, thank you!
>
> It's almost what I was looking for! Only one small thing - it doesn't
> "merge" the results for weeks that "straddle" 2 years. In my example -
> last week of year 2008 and the very first week of 2009 are one week.
> Any way to "join them"?
> Asking because in reality I'll have many years and hundreds of groups
> - hence, it'll be hard to do it manually.
>
>
> BTW - does format(dates,"%Y.%W") always consider weeks as starting with Mondays?
>
> Thank you very much!
> Dimitri
>
>
> On Wed, Mar 30, 2011 at 2:55 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>> Try this:
>>
>> aggregate(value ~ group + format(dates, "%Y.%W"), myframe, FUN = sum)
>>
>>
>> On Wed, Mar 30, 2011 at 11:23 AM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>> Dear everybody,
>>>
>>> I have the following challenge. I have a data set with 2 subgroups,
>>> dates (days), and corresponding values (see example code below).
>>> Within each subgroup: I need to aggregate (sum) the values by week -
>>> for weeks that start on a Monday (for example, 2008-12-29 was a
>>> Monday).
>>> I find it difficult because I have missing dates in my data - so that
>>> sometimes I don't even have the date for some Mondays. So, I can't
>>> write a proper loop.
>>> I want my output to look something like this:
>>> group dates value
>>> group.1 2008-12-29 3.0937
>>> group.1 2009-01-05 3.8833
>>> group.1 2009-01-12 1.362
>>> ...
>>> group.2 2008-12-29 2.250
>>> group.2 2009-01-05 1.4057
>>> group.2 2009-01-12 3.4411
>>> ...
>>>
>>> Thanks a lot for your suggestions! The code is below:
>>> Dimitri
>>>
>>> ### Creating example data set:
>>> mydates<-rep(seq(as.Date("2008-12-29"), length = 43, by = "day"),2)
>>> myfactor<-c(rep("group.1",43),rep("group.2",43))
>>> set.seed(123)
>>> myvalues<-runif(86,0,1)
>>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>>> (myframe)
>>> dim(myframe)
>>>
>>> ## Removing same rows (dates) unsystematically:
>>> set.seed(123)
>>> removed.group1<-sample(1:43,size=11,replace=F)
>>> set.seed(456)
>>> removed.group2<-sample(44:86,size=11,replace=F)
>>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>>> to.remove<-to.remove[order(to.remove)]
>>> myframe<-myframe[-to.remove,]
>>> (myframe)
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> Ninah Consulting
>>> www.ninah.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list