[R] summing values by week - based on daily dates - but with some dates missing

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Wed Mar 30 23:10:30 CEST 2011


Yes, zoo! That's what I forgot. It's great.
Henrique, thanks a lot! One question:

if the data are as I originally posted - then week numbered 52 is
actually the very first week (it straddles 2008-2009).
What if the data much longer (like in the code below - same as before,
but more dates) so that we have more than 1 year to deal with.
It looks like this code is lumping everything into 52 weeks. And my
goal is to keep each week independent. If I have 2 years, then it
should be 100+ weeks. Makes sense?
Thank you!

### Creating a longer example data set:
mydates<-rep(seq(as.Date("2008-12-29"), length = 500, by = "day"),2)
myfactor<-c(rep("group.1",500),rep("group.2",500))
set.seed(123)
myvalues<-runif(1000,0,1)
myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
(myframe)
dim(myframe)

## Removing same rows (dates) unsystematically:
set.seed(123)
removed.group1<-sample(1:500,size=150,replace=F)
set.seed(456)
removed.group2<-sample(501:1000,size=150,replace=F)
to.remove<-c(removed.group1,removed.group2);length(to.remove)
to.remove<-to.remove[order(to.remove)]
myframe<-myframe[-to.remove,]
(myframe)
dim(myframe)
names(myframe)

library(zoo)
wk <- as.numeric(format(myframe$dates, '%W'))
is.na(wk) <- wk == 0
solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
solution<-solution[order(solution$group),]
write.csv(solution,file="test.csv",row.names=F)



On Wed, Mar 30, 2011 at 4:45 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> Try this:
>
> library(zoo)
> wk <- as.numeric(format(myframe$dates, '%W'))
> is.na(wk) <- wk == 0
> aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>
>
>
> On Wed, Mar 30, 2011 at 4:35 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Henrique, this is great, thank you!
>>
>> It's almost what I was looking for! Only one small thing - it doesn't
>> "merge" the results for weeks that "straddle" 2 years. In my example -
>> last week of year 2008 and the very first week of 2009 are one week.
>> Any way to "join them"?
>> Asking because in reality I'll have many years and hundreds of groups
>> - hence, it'll be hard to do it manually.
>>
>>
>> BTW - does format(dates,"%Y.%W") always consider weeks as starting with Mondays?
>>
>> Thank you very much!
>> Dimitri
>>
>>
>> On Wed, Mar 30, 2011 at 2:55 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>>> Try this:
>>>
>>> aggregate(value ~ group + format(dates, "%Y.%W"), myframe, FUN = sum)
>>>
>>>
>>> On Wed, Mar 30, 2011 at 11:23 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>> Dear everybody,
>>>>
>>>> I have the following challenge. I have a data set with 2 subgroups,
>>>> dates (days), and corresponding values (see example code below).
>>>> Within each subgroup: I need to aggregate (sum) the values by week -
>>>> for weeks that start on a Monday (for example, 2008-12-29 was a
>>>> Monday).
>>>> I find it difficult because I have missing dates in my data - so that
>>>> sometimes I don't even have the date for some Mondays. So, I can't
>>>> write a proper loop.
>>>> I want my output to look something like this:
>>>> group   dates   value
>>>> group.1 2008-12-29  3.0937
>>>> group.1 2009-01-05  3.8833
>>>> group.1 2009-01-12  1.362
>>>> ...
>>>> group.2 2008-12-29  2.250
>>>> group.2 2009-01-05  1.4057
>>>> group.2 2009-01-12  3.4411
>>>> ...
>>>>
>>>> Thanks a lot for your suggestions! The code is below:
>>>> Dimitri
>>>>
>>>> ### Creating example data set:
>>>> mydates<-rep(seq(as.Date("2008-12-29"), length = 43, by = "day"),2)
>>>> myfactor<-c(rep("group.1",43),rep("group.2",43))
>>>> set.seed(123)
>>>> myvalues<-runif(86,0,1)
>>>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>>>> (myframe)
>>>> dim(myframe)
>>>>
>>>> ## Removing same rows (dates) unsystematically:
>>>> set.seed(123)
>>>> removed.group1<-sample(1:43,size=11,replace=F)
>>>> set.seed(456)
>>>> removed.group2<-sample(44:86,size=11,replace=F)
>>>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>>>> to.remove<-to.remove[order(to.remove)]
>>>> myframe<-myframe[-to.remove,]
>>>> (myframe)
>>>>
>>>>
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>> Ninah Consulting
>>>> www.ninah.com
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Henrique Dallazuanna
>>> Curitiba-Paraná-Brasil
>>> 25° 25' 40" S 49° 16' 22" O
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list