[R-SIG-Finance] question on zoo data manipulation

Manoj manojsw at gmail.com
Thu Apr 17 06:41:08 CEST 2008


Thanks a lot Gabor!

I was playing around with sqldf yesterday and thought of the same only
to find a solution on those lines - much apppreciated!

Cheers

Manoj

On 4/17/08, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> I thought about this one a bit more and have two
> additional solutions.  One uses zoo more intensively
> by forming the "time" index out of the merge keys.
> This relies on the fact that zoo can use any class
> with certain methods, not just the usual time/date
> classes.  However, I think that the best wayof thinking
> about this problem is from an SQL viewpoint since its
> basically just a three way self merge followed by
> an aggregation and the entire thing can be done
> in a single SQL statement (although it spans several
> lines).  Solution 1 is our prior minimally zoo solution,
> solution 2 is the much more zoo-ish solution and
> solution 3 uses sqldf to implement it in SQL.
>
> DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L,
> 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L,
> 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L,
> 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"),
>    Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L,
>    200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8,
>    3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L,
>    1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName",
> "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA,
> -9L))
>
>
> # zoo solution 1
>
> library(zoo)
>
> f <- function(x) {
>    br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"])
>    sb <- subset(x, BrokerName %in% br)
>    ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean)
>    transform(tail(ag, -1), Measure =
>        coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1))
> }
> do.call("rbind", by(DF, DF[c(1, 4)], f))
>
> # zoo solution 2
>
> library(zoo)
>
> z <- zoo(DF$Measure, apply(DF[c(1, 3, 4, 6)], 1, paste, collapse = ":"))
>
> zl <- zoo(DF$Measure,
>    apply(transform(DF[c(1, 3, 4, 6)], lag = lag+1), 1, paste, collapse = ":"))
>
> zm <- merge(z, zl, all = FALSE)
>
> z01 <- zm[sub(":[0-9]*$", ":1", time(zm)) %in% time(zm)]
>
> transform(aggregate(z01, sub(":[^:]*", "", time(z01)), mean), Change = z/zl-1)
>
> # solution 3 - sqldf
>
> library(sqldf)
> sqldf("select Ticker, Date__1, Acc_Yr, lag, avg(Measure)/avg(Mprev)-1 Change
>    from (select y.*, x.Measure Mprev from DF x, DF y, DF z
>    where x.Ticker = y.Ticker and x.Acc_Yr = y.Acc_Yr
>        and x.BrokerName = y.BrokerName and x.lag = y.lag - 1
>        and x.Ticker = z.Ticker and x.Acc_Yr = z.Acc_Yr
>        and x.BrokerName = z.BrokerName and z.lag = 1)
>    group by Ticker, Acc_Yr, lag")
>
>
>
> On Tue, Apr 15, 2008 at 8:56 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
> > This does not really use zoo in any significant way since it
> > does not become a time series until the last line of f but
> > here is a solution that does use zoo in that one last line.
> >
> > We use by to split up the data frame with a function f.
> > In f, br intersects the lag0 and lag1 brokers and then we
> > subset x according to those lines having a broker in br.
> > We then take the means, convert the series to zoo and
> > perform diff.zoo on it.
> >
> > There are some aspects of the problem that were not defined
> > such as whether to use lag 1 to compare lag 3 if there is no
> > lag 2 and we did that here but that could be changed by using
> > the lag as the time index.
> >
> > We have also dumped out the data frame, DF, using dput to make
> > it easier to reproduce the solution.
> >
> > DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> > 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L,
> > 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L,
> > 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L,
> > 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"),
> >    Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L,
> >    200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8,
> >    3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L,
> >    1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName",
> > "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA,
> > -9L))
> >
> > library(zoo)
> >
> > f <- function(x) {
> >    br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"])
> >    sb <- subset(x, BrokerName %in% br)
> >    ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean)
> >    transform(tail(ag, -1), Measure =
> >        coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1))
> > }
> > do.call("rbind", by(DF, DF[c(1, 4)], f))
> >
> >
> > On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw at gmail.com> wrote:
> >
> > > Hi Zoo-experts,
> > >      I am working on the data-set below.
> > >
> > > Ticker  Date    BrokerName      Acc_Yr  Measure lag
> > > XXX     20080320        BRK1    200806  2.2     0
> > > XXX     20080320        BRK1    200906  2.5     0
> > > XXX     20080320        BRK2    200806  2.3     0
> > > XXX     20080320        BRK2    200906  2.8     0
> > > XXX     20080320        BRK3    200806  3.3     0
> > > XXX     20080218        BRK1    200806  2.2     1
> > > XXX     20080218        BRK1    200906  2.5     1
> > > XXX     20080218        BRK2    200806  2.4     1
> > > XXX     20080218        BRK2    200906  2.8     1
> > >
> > >
> > >
> > > Using zoo object, Is there a quicker/efficient way of manipulating the
> > > data as per following criteria?
> > >
> > > 1) For any given date/lag - compute mean of column "measure" grouped
> > > by different broker & different accounting year?
> > >          so the output data-set should look like:
> > >
> > > Ticker  Date    Mean Measure    Acc_Yr  Lag
> > > XXX     20080320        2.6     200806  0
> > >
> > > 2) For any lag >= 1, calculate returns on  aggregate "measure"
> > > constrained on "intersection" of broker-name across lag 0 & lag 1 (so
> > > BRK3 should drop out) ?
> > >
> > > i.e:  the intermediate data-set should look like
> > >
> > > Ticker  Date    Mean Measure    Acc_Yr  Lag
> > > XXX     20080320        2.25    200806  0
> > > XXX     20080318        2.3     200806  1
> > >
> > >
> > > Note that for 200806, the mean changes from 2.6 as measured above to
> > > 2.25 (since BRK3 is dropped in calculation.  The final data-set should
> > > then be:
> > >
> > > Ticker  Date    Pct_Change      Acc_Yr  Lag
> > > XXX     20080218        0.02    200806  1
> > >
> > > --------------------
> > >
> > > I can accomplish the results using a combination of tapply &
> > > subsetting the data-set for each lag but I thought this kind of
> > > data-structure is ideal for zoo manipulation, hence the help request.
> > >
> > > Thanks in Advance.
> > >
> > > Manoj
> > >
> > > _______________________________________________
> > > R-SIG-Finance at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > > -- Subscriber-posting only.
> > > -- If you want to post, subscribe first.
> > >
> >
>



More information about the R-SIG-Finance mailing list