[R-SIG-Finance] Aggregating Statistics By Time Interval

Gabor Grothendieck ggrothendieck at gmail.com
Fri Aug 3 12:35:40 CEST 2007


Can you provide a reproducible example that exhibits the warning.
Redoing it in a more easily reproducible way and using the data
in your post gives me no warning

> tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
+ 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
+ 2e-04, 1e-04))
>
> twas <-
+  function(dat) {
+    data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+  sum)/sum(diff(dat$time)) * 100.0)
+ }
> now <- Sys.time()
> epoch <- now - as.numeric(now)
> z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
> z
      1e-04    2e-04
07 66.66667 33.33333
> R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)"


Here is input:

tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
2e-04, 1e-04))
twas <-
 function(dat) {
   data.frame(tapply(diff(dat$time), head(dat$spread, -1),
 sum)/sum(diff(dat$time)) * 100.0)
}
now <- Sys.time()
epoch <- now - as.numeric(now)
z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
z
R.version.string # XP



On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
> Hi
>
> I've been wrestling with this a little bit, using the example in the email
> that Gabor pointed me to as a reference, and I think I have almost got what
> I want...however its still not quite right.
>
> I have a variable, tmp, with two dimensions: time and spread:
>
> > head(tmp$time)
> [1] 1185882786 1185882790 1185882791 1185882791 1185882792 1185882795
>
> > head(tmp$spread)
> [1] 1e-04 1e-04 2e-04 1e-04 2e-04 1e-04
> >
>
> I also have a function that calculates the time-weighted average spread:
>
> > twas
> function(dat) {
>   data.frame(tapply(diff(dat$time), head(dat$spread, -1),
> sum)/sum(diff(dat$time)) * 100.0)
> }
>
> I can combine them using as rbind() and by():
>
> z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
>
> (epoch is just an instance of ISOdatetime)
>
> This gives me a warning:
>
> Warning message:
> number of columns of result
>        is not a multiple of vector length (arg 3) in: rbind(1, "12" = c(
> 91.99207541277, 8.00792458723005), "13" = c(90.1884966797708,
>
> The output from the above command is almost exactly what I need, apart from
> the recycling:
>
>      1e-04     2e-04      3e-04        4e-04
> 12 91.99208  8.007925 91.9920754  8.007924587 <== recycled values
> 13 90.18850  9.337448  0.4218405  0.052214551
> 14 90.59640  9.171417  0.2321811 90.596401668
> 15 89.55771 10.194291  0.2343418  0.013661453
> ...
>
> I can just pass this into a barplot() and get a nice visual breakdown of
> hourly weighted spreads, *but* I dont know how to get these results without
> the recycling. Looking at rbind(), it seems that this will automatically
> recycle. Does anyone know of a function I could use to get these results
> without this problem?
>
> Cheers
> Rory
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 8/1/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> >
> > Something similar was just discussed this morning:
> > https://www.stat.math.ethz.ch/pipermail/r-help/2007-August/137695.html
> >
> >
> > On 8/1/07, Rory Winston <rory.winston at gmail.com> wrote:
> > > Hi all
> > >
> > > I have a question about aggegating statistics by time intervals. I have
> > a
> > > data set with 3 columns : time, bid, and ask. Time is specified as a
> > > millisecond timestamp since epoch. I would like to compute summary
> > > statistics for the data set on an hourly basis. Here is what I have
> > tried so
> > > far:
> > >
> > > # Data is in pricedata
> > >
> > > t <- ISODatetime(1970, 1, 1, 0, 0, 0) + pricedata$time
> > > agg <- aggregate(pricedata$spread, list(byhour=format(t, "%Y-%m %H")),
> > mean)
> > >
> > > This seems to do what I want - however, what really want to do is more
> > > specific: I would like to be able to extract a subset of the data frame
> > > pricedata, and not just the aggregated entries - for instance, instead
> > of
> > > just extracting pricedata$spread by hour, I would like to extract a
> > slice of
> > > columns, e.g. pricedata$spread and pricedata$time on an hourly basis,
> > and
> > > pass these into a function that can compute a time-weighted average
> > spread,
> > > for instance. Does anyone know an elegant way to do this? I have a
> > feeling
> > > zoo may do what I want, but I'm new to zoo ...
> > >
> > > Cheers
> > > Rory
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-SIG-Finance at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > > -- Subscriber-posting only.
> > > -- If you want to post, subscribe first.
> > >
> >
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>



More information about the R-SIG-Finance mailing list