# [R] zoo:rollapply by multiple grouping factors

Gabor Grothendieck ggrothendieck at gmail.com
Mon Apr 4 00:27:07 CEST 2011

```On Sun, Apr 3, 2011 at 11:58 AM, Mark Novak <mnovak1 at ucsc.edu> wrote:
> # Hi there,
> # I am trying to apply a function over a moving-window for a large number of
> multivariate time-series that are grouped in a nested set of factors.  I
> have spent a few days searching for solutions with no luck, so any
> suggestions are much appreciated.
>
> # The data I have are for the abundance dynamics of multiple species
> observed in multiple fixed plots at multiple sites.  (I total I have 7
> sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So my
> data look something like this:
>
> dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1),
> Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24))
> dat
>
> # Let the function I want to apply over a right-aligned window of w=2 time
> steps be:
> cv<-function(x){sd(x)/mean(x)}
> w<-2
>
> # The final output I want would look something like this:
> Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2))
>
> # I could reshape and apply zoo:rollapply() to a given plot at a given site,
> and reshape again as follows:
> library(zoo)
> a<-subset(dat,Site==1&Plot==1)
> b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide')
> d<-zoo(b[,-1],b[,1])
> d
> out<-rollapply(d, w, cv, na.pad=T, align='right')
> out
>
> # I would thereby have to loop through all my sites and plots which,
> although it deals with all species at once, still seems exceedingly
> inefficient.
>
> # So the question is, how do I use something like aggregate.zoo or tapply or
> even lapply to apply rollapply on each species' time series.
>
> # The closest I've come is the following two approaches:
>
> # First let:
> datx<-list(Site=dat\$Site,Plot=dat\$Plot,Sp=dat\$Sp)
> daty<-dat\$Count
>
> # Method 1.
> out1<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]),
> w, cv, na.pad=T, align='right') })
> out1
> out1[,,1]
>
> # Which "works" in that it gives me the right answers, but in a format from
> which I can't figure out how to get back into the format I want.
>
> # Method 2.
> fun<-function(x){y<-zoo(x);coredata(rollapply(y, w,
> out2<-aggregate(daty,by=datx,fun)
> out2
>
> # Which superficially "works" better, but again only in a format I can't
> figure out how to use because the output seems to be a mix of data.frame and
> lists.
> out2[1,4]
> out2[1,5]
> is.data.frame(out2)
> is.list(out2)
>
> # The situation is made more problematic by the fact that the time point of
> first survey can differ between plots  (e.g., site1-plot3 may only start at
> time-point 3).  As in...
> dat2<-dat
> dat2<-dat2[-which(dat2\$Plot==3 & dat2\$Time<3),]
> dat2
>
> # I must therefore ensure that I'm keeping track of the true time associated
> with each value, not just the order of their occurences.  This information
> is (seemingly) lost by both methods.
> datx<-list(Site=dat2\$Site,Plot=dat2\$Plot,Sp=dat2\$Sp)
> daty<-dat2\$Count
>
> # Method 1.
> out3<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]),
> w, cv, na.pad=T, align='right') })
> out3
> out3[1,3,1]
> time(out3[1,3,1])
>
> # Method 2
> out4<-aggregate(daty,by=datx,fun)
> out4
> time(out4[3,4])
>
>
>  Any thoughts and suggestions are much appreciated!
>
> # R 2.12.2 GUI 1.36 Leopard build 32-bit (5691); zoo 1.6-4
>
> # Thanks!
> # -mark
>

Try ave:

dat\$cv <- ave(dat\$Count, dat[c("Site", "Plot", "Sp")], FUN =
function(x) rollapply(zoo(x), 2, cv, na.pad = TRUE, align = "right"))

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

```