[R-SIG-Finance] How can I do this better (handling "realtime" macroeconomic data)?

Ajay Shah ajayshah at mayin.org
Wed Jul 1 05:19:20 CEST 2009


One problem faced with macroeconomic data is that of data revision. On
date t1 you're told a value x, and then on date t2 you're told that
same value has been changed to y. It is often fine to merely ignore
older values. But in some problems, it becomes important to track the
time-series as observed at past dates. In order to address this
problem, we need to store not just the time-series as a set of
(date,value) pairs, but also an additional field "infodate" which is
the date on which a given record was observed.

Here's an example of this data representation:

  a <- structure(list(date = c("2007-04-01", "2007-04-01", "2007-05-01", 
    "2007-04-01", "2007-05-01", "2007-06-01", "2007-05-01", "2007-06-01", 
    "2007-07-01"), infodate = structure(c(13634, 13665, 13665, 13695, 
    13695, 13695, 13726, 13726, 13726), class = "Date"), value = c(42L, 
    43L, 55L, 49L, 55L, 66L, 56L, 67L, 77L)), .Names = c("date", 
    "infodate", "value"), row.names = c(NA, -9L), class = "data.frame")

So this is a dataset containing date (a string), infodate (a Date) and value.

Using this representation, I wrote a function which queries the
dataset and reports the time series as seen on a given date. If a
value for ondate is supplied, only records with infodate <= ondate are

  fetch.ts <- function(d, ondate=NULL) {
    if (!is.null(ondate)) {
      d <- subset(d, d$infodate <= ondate)
    # Now we walk through the series, and every time a new value for
    # a given date shows up, we overwrite the previous version.
    x <- a$value[1]; names(x)[1] <- d$date[1]
    for (i in 2:nrow(d)) {
      x[d$date[i]] <- d$value[i]

This seems to work okay:

  all.equal(fetch.ts(a), structure(c(49L, 56L, 67L, 77L),
                                   .Names = c("2007-04-01",
                                     "2007-05-01", "2007-06-01", "2007-07-01")))
  fetch.ts(a, "2007-07-01")
  all.equal(fetch.ts(a, "2007-07-01"),
            structure(c(49L, 55L, 66L),
                      .Names = c("2007-04-01", "2007-05-01", "2007-06-01")))

but I'm not happy at my loops-intensive solution. Also, the use of
associative arrays (using the names in R) might be quite
expensive. How would you improve on this?

Ajay Shah                                      http://www.mayin.org/ajayshah  
ajayshah at mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.

More information about the R-SIG-Finance mailing list