[R-SIG-Finance] question on zoo data manipulation
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Apr 15 14:56:46 CEST 2008
This does not really use zoo in any significant way since it
does not become a time series until the last line of f but
here is a solution that does use zoo in that one last line.
We use by to split up the data frame with a function f.
In f, br intersects the lag0 and lag1 brokers and then we
subset x according to those lines having a broker in br.
We then take the means, convert the series to zoo and
perform diff.zoo on it.
There are some aspects of the problem that were not defined
such as whether to use lag 1 to compare lag 3 if there is no
lag 2 and we did that here but that could be changed by using
the lag as the time index.
We have also dumped out the data frame, DF, using dput to make
it easier to reproduce the solution.
DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L,
20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L,
20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L,
3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"),
Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L,
200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8,
3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName",
"Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA,
-9L))
library(zoo)
f <- function(x) {
br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"])
sb <- subset(x, BrokerName %in% br)
ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean)
transform(tail(ag, -1), Measure =
coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1))
}
do.call("rbind", by(DF, DF[c(1, 4)], f))
On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw at gmail.com> wrote:
> Hi Zoo-experts,
> I am working on the data-set below.
>
> Ticker Date BrokerName Acc_Yr Measure lag
> XXX 20080320 BRK1 200806 2.2 0
> XXX 20080320 BRK1 200906 2.5 0
> XXX 20080320 BRK2 200806 2.3 0
> XXX 20080320 BRK2 200906 2.8 0
> XXX 20080320 BRK3 200806 3.3 0
> XXX 20080218 BRK1 200806 2.2 1
> XXX 20080218 BRK1 200906 2.5 1
> XXX 20080218 BRK2 200806 2.4 1
> XXX 20080218 BRK2 200906 2.8 1
>
>
>
> Using zoo object, Is there a quicker/efficient way of manipulating the
> data as per following criteria?
>
> 1) For any given date/lag - compute mean of column "measure" grouped
> by different broker & different accounting year?
> so the output data-set should look like:
>
> Ticker Date Mean Measure Acc_Yr Lag
> XXX 20080320 2.6 200806 0
>
> 2) For any lag >= 1, calculate returns on aggregate "measure"
> constrained on "intersection" of broker-name across lag 0 & lag 1 (so
> BRK3 should drop out) ?
>
> i.e: the intermediate data-set should look like
>
> Ticker Date Mean Measure Acc_Yr Lag
> XXX 20080320 2.25 200806 0
> XXX 20080318 2.3 200806 1
>
>
> Note that for 200806, the mean changes from 2.6 as measured above to
> 2.25 (since BRK3 is dropped in calculation. The final data-set should
> then be:
>
> Ticker Date Pct_Change Acc_Yr Lag
> XXX 20080218 0.02 200806 1
>
> --------------------
>
> I can accomplish the results using a combination of tapply &
> subsetting the data-set for each lag but I thought this kind of
> data-structure is ideal for zoo manipulation, hence the help request.
>
> Thanks in Advance.
>
> Manoj
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>
More information about the R-SIG-Finance
mailing list