[R-SIG-Finance] Aggregating Statistics By Time Interval

Fri Aug 3 14:20:52 CEST 2007

I still get no warning.  Please provide complete self contained input
and output.

> tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
+  1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
+  2e-04, 3e-04))
>
> twas <-
+  function(dat) {
+    data.frame(tapply(diff(dat$time), head(dat$spread, -1),
+  sum)/sum(diff(dat$time)) * 100.0)
+ }
> now <- Sys.time()
> epoch <- now - as.numeric(now)
> z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
> z
      1e-04    2e-04
07 66.66667 33.33333
>
> R.version.string # XP
[1] "R version 2.5.1 (2007-06-27)"

On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
> Hi
>
> I have figured out what causes the warning (and recycling), but I am not
> sure how I can fix it. After seeing that it seemed to work for you, I went
> back and tried working with different subsets of the data. I eventually
> found where it occurs - when we get a third unique spread value. To
> reproduce, just change the definition of tmp to be:
>
> tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
>  1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
>  2e-04, 3e-04)) <== Added 3e-04
>
> i.e. I have just changed one of the spread values to be a third value - this
> seems to trigger the warning  "Warning message:number of columns of result
> is not a multiple of vector length (arg 3) in: rbind", and the recycling. I
> tried this on R 2.5.0 and 2.5.1
>
> Can anyone see what I am doing wrong here?
>
> Cheers
> Rory
>
>
>
>
>
>
> On 8/3/07, Gabor Grothendieck < ggrothendieck at gmail.com> wrote:
> > Can you provide a reproducible example that exhibits the warning.
> > Redoing it in a more easily reproducible way and using the data
> > in your post gives me no warning
> >
> > > tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791,
> 1185882791,
> > + 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
> > + 2e-04, 1e-04))
> > >
> > > twas <-
> > +  function(dat) {
> > +    data.frame(tapply(diff(dat$time), head(dat$spread, -1),
> > +  sum)/sum(diff(dat$time)) * 100.0)
> > + }
> > > now <- Sys.time()
> > > epoch <- now - as.numeric(now)
> > > z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
> > > z
> >       1e-04    2e-04
> > 07 66.66667 33.33333
> > > R.version.string # XP
> > [1] "R version 2.5.1 (2007-06-27)"
> >
> >
> > Here is input:
> >
> > tmp <- data.frame(time = c(1185882786, 1185882790, 1185882791, 1185882791,
> > 1185882792, 1185882795), spread = c(1e-04, 1e-04, 2e-04, 1e-04,
> > 2e-04, 1e-04))
> > twas <-
> > function(dat) {
> >    data.frame(tapply(diff(dat$time), head(dat$spread, -1),
> > sum)/sum(diff(dat$time)) * 100.0)
> > }
> > now <- Sys.time()
> > epoch <- now - as.numeric(now)
> > z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
> > z
> > R.version.string # XP
> >
> >
> >
> > On 8/3/07, Rory Winston <rory.winston at gmail.com> wrote:
> > > Hi
> > >
> > > I've been wrestling with this a little bit, using the example in the
> email
> > > that Gabor pointed me to as a reference, and I think I have almost got
> what
> > > I want...however its still not quite right.
> > >
> > > I have a variable, tmp, with two dimensions: time and spread:
> > >
> > > > head(tmp$time)
> > > [1] 1185882786 1185882790 1185882791 1185882791 1185882792 1185882795
> > >
> > > > head(tmp$spread)
> > > [1] 1e-04 1e-04 2e-04 1e-04 2e-04 1e-04
> > > >
> > >
> > > I also have a function that calculates the time-weighted average spread:
> > >
> > > > twas
> > > function(dat) {
> > >   data.frame(tapply(diff(dat$time), head(dat$spread, -1),
> > > sum)/sum(diff(dat$time)) * 100.0)
> > > }
> > >
> > > I can combine them using as rbind() and by():
> > >
> > > z <- do.call("rbind", by(tmp, format(epoch + tmp$time, "%H"), twas))
> > >
> > > (epoch is just an instance of ISOdatetime)
> > >
> > > This gives me a warning:
> > >
> > > Warning message:
> > > number of columns of result
> > >        is not a multiple of vector length (arg 3) in: rbind(1, "12" = c(
> > > 91.99207541277, 8.00792458723005), "13" = c(90.1884966797708,
> > >
> > > The output from the above command is almost exactly what I need, apart
> from
> > > the recycling:
> > >
> > >      1e-04     2e-04      3e-04        4e-04
> > > 12 91.99208  8.007925 91.9920754  8.007924587 <== recycled values
> > > 13 90.18850  9.337448  0.4218405  0.052214551
> > > 14 90.59640  9.171417  0.2321811 90.596401668
> > > 15 89.55771 10.194291  0.2343418  0.013661453
> > > ...
> > >
> > > I can just pass this into a barplot() and get a nice visual breakdown of
> > > hourly weighted spreads, *but* I dont know how to get these results
> without
> > > the recycling. Looking at rbind(), it seems that this will automatically
> > > recycle. Does anyone know of a function I could use to get these results
> > > without this problem?
> > >
> > > Cheers
> > > Rory
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 8/1/07, Gabor Grothendieck <ggrothendieck at gmail.com > wrote:
> > > >
> > > > Something similar was just discussed this morning:
> > > >
> https://www.stat.math.ethz.ch/pipermail/r-help/2007-August/137695.html
> > > >
> > > >
> > > > On 8/1/07, Rory Winston <rory.winston at gmail.com> wrote:
> > > > > Hi all
> > > > >
> > > > > I have a question about aggegating statistics by time intervals. I
> have
> > > > a
> > > > > data set with 3 columns : time, bid, and ask. Time is specified as a
> > > > > millisecond timestamp since epoch. I would like to compute summary
> > > > > statistics for the data set on an hourly basis. Here is what I have
> > > > tried so
> > > > > far:
> > > > >
> > > > > # Data is in pricedata
> > > > >
> > > > > t <- ISODatetime(1970, 1, 1, 0, 0, 0) + pricedata$time
> > > > > agg <- aggregate(pricedata$spread, list(byhour=format(t, "%Y-%m
> %H")),
> > > > mean)
> > > > >
> > > > > This seems to do what I want - however, what really want to do is
> more
> > > > > specific: I would like to be able to extract a subset of the data
> frame
> > > > > pricedata, and not just the aggregated entries - for instance,
> instead
> > > > of
> > > > > just extracting pricedata$spread by hour, I would like to extract a
> > > > slice of
> > > > > columns, e.g. pricedata$spread and pricedata$time on an hourly
> basis,
> > > > and
> > > > > pass these into a function that can compute a time-weighted average
> > > > spread,
> > > > > for instance. Does anyone know an elegant way to do this? I have a
> > > > feeling
> > > > > zoo may do what I want, but I'm new to zoo ...
> > > > >
> > > > > Cheers
> > > > > Rory
> > > > >
> > > > >        [[alternative HTML version deleted]]
> > > > >
> > > > > _______________________________________________
> > > > > R-SIG-Finance at stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > > > > -- Subscriber-posting only.
> > > > > -- If you want to post, subscribe first.
> > > > >
> > > >
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-SIG-Finance at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > > -- Subscriber-posting only.
> > > -- If you want to post, subscribe first.
> > >
> >
>
>