[R] summing values by week - based on daily dates - but with some dates missing

Gabor Grothendieck ggrothendieck at gmail.com
Thu Mar 31 01:33:32 CEST 2011


On Wed, Mar 30, 2011 at 5:10 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Yes, zoo! That's what I forgot. It's great.
> Henrique, thanks a lot! One question:
>
> if the data are as I originally posted - then week numbered 52 is
> actually the very first week (it straddles 2008-2009).
> What if the data much longer (like in the code below - same as before,
> but more dates) so that we have more than 1 year to deal with.
> It looks like this code is lumping everything into 52 weeks. And my
> goal is to keep each week independent. If I have 2 years, then it
> should be 100+ weeks. Makes sense?
> Thank you!
>
> ### Creating a longer example data set:
> mydates<-rep(seq(as.Date("2008-12-29"), length = 500, by = "day"),2)
> myfactor<-c(rep("group.1",500),rep("group.2",500))
> set.seed(123)
> myvalues<-runif(1000,0,1)
> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
> (myframe)
> dim(myframe)
>
> ## Removing same rows (dates) unsystematically:
> set.seed(123)
> removed.group1<-sample(1:500,size=150,replace=F)
> set.seed(456)
> removed.group2<-sample(501:1000,size=150,replace=F)
> to.remove<-c(removed.group1,removed.group2);length(to.remove)
> to.remove<-to.remove[order(to.remove)]
> myframe<-myframe[-to.remove,]
> (myframe)
> dim(myframe)
> names(myframe)
>
> library(zoo)
> wk <- as.numeric(format(myframe$dates, '%W'))
> is.na(wk) <- wk == 0
> solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
> solution<-solution[order(solution$group),]
> write.csv(solution,file="test.csv",row.names=F)

Here is a variation on Henrique's answer.

It uses nextfri which is a one line function that appears in the
zoo-quickref vignette and is reproduced below.  If d is a vector of
class "Date" then nextfri(d) is a vector of the same length with each
component replaced by the date of the next Friday.   Friday is
represented by 5 in the formula so you can change it to 0 for Sunday,
1 for Monday, ..., 6 for Saturday if you want your weeks to end on
some other day of the week.

> nextfri <- function(x) 7 * ceiling(as.numeric(x-5+4) / 7) + as.Date(5-4)
> soln2 <- aggregate(value ~ group + nextfri(dates), myframe, FUN = sum)
> head(soln2)
    group nextfri(dates)     value
1 group.1     2009-01-02 2.1377493
2 group.2     2009-01-02 0.7335145
3 group.1     2009-01-09 3.4309641
4 group.2     2009-01-09 2.7102963
5 group.1     2009-01-16 2.8690217
6 group.2     2009-01-16 3.2792832


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list