[R-SIG-Finance] How can I do this better (handling "realtime" macroeconomic data)?
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Jul 1 09:24:12 CEST 2009
Try this:
fetch2 <- function(d, as.of.date = Inf) {
do.call(rbind, by(d, d$date, function(x)
x[which.max(x[x$infodate <= as.of.date, ]$infodate), ]
))}
On Tue, Jun 30, 2009 at 11:19 PM, Ajay Shah<ajayshah at mayin.org> wrote:
> Folks,
>
> One problem faced with macroeconomic data is that of data revision. On
> date t1 you're told a value x, and then on date t2 you're told that
> same value has been changed to y. It is often fine to merely ignore
> older values. But in some problems, it becomes important to track the
> time-series as observed at past dates. In order to address this
> problem, we need to store not just the time-series as a set of
> (date,value) pairs, but also an additional field "infodate" which is
> the date on which a given record was observed.
>
> Here's an example of this data representation:
>
> a <- structure(list(date = c("2007-04-01", "2007-04-01", "2007-05-01",
> "2007-04-01", "2007-05-01", "2007-06-01", "2007-05-01", "2007-06-01",
> "2007-07-01"), infodate = structure(c(13634, 13665, 13665, 13695,
> 13695, 13695, 13726, 13726, 13726), class = "Date"), value = c(42L,
> 43L, 55L, 49L, 55L, 66L, 56L, 67L, 77L)), .Names = c("date",
> "infodate", "value"), row.names = c(NA, -9L), class = "data.frame")
> a
> str(a)
>
> So this is a dataset containing date (a string), infodate (a Date) and value.
>
> Using this representation, I wrote a function which queries the
> dataset and reports the time series as seen on a given date. If a
> value for ondate is supplied, only records with infodate <= ondate are
> utilised.
>
> fetch.ts <- function(d, ondate=NULL) {
> if (!is.null(ondate)) {
> d <- subset(d, d$infodate <= ondate)
> }
>
> # Now we walk through the series, and every time a new value for
> # a given date shows up, we overwrite the previous version.
> x <- a$value[1]; names(x)[1] <- d$date[1]
> for (i in 2:nrow(d)) {
> x[d$date[i]] <- d$value[i]
> }
> x
> }
>
> This seems to work okay:
>
> fetch.ts(a)
> all.equal(fetch.ts(a), structure(c(49L, 56L, 67L, 77L),
> .Names = c("2007-04-01",
> "2007-05-01", "2007-06-01", "2007-07-01")))
> fetch.ts(a, "2007-07-01")
> all.equal(fetch.ts(a, "2007-07-01"),
> structure(c(49L, 55L, 66L),
> .Names = c("2007-04-01", "2007-05-01", "2007-06-01")))
>
> but I'm not happy at my loops-intensive solution. Also, the use of
> associative arrays (using the names in R) might be quite
> expensive. How would you improve on this?
>
> --
> Ajay Shah http://www.mayin.org/ajayshah
> ajayshah at mayin.org http://ajayshahblog.blogspot.com
> <*(:-? - wizard who doesn't know the answer.
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>
More information about the R-SIG-Finance
mailing list