[R-SIG-Finance] How can I do this better (handling "realtime" macroeconomic data)?

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 1 09:24:12 CEST 2009


Try this:

fetch2 <- function(d, as.of.date = Inf) {
	do.call(rbind, by(d, d$date, function(x)
		x[which.max(x[x$infodate <= as.of.date, ]$infodate), ]
))}


On Tue, Jun 30, 2009 at 11:19 PM, Ajay Shah<ajayshah at mayin.org> wrote:
> Folks,
>
> One problem faced with macroeconomic data is that of data revision. On
> date t1 you're told a value x, and then on date t2 you're told that
> same value has been changed to y. It is often fine to merely ignore
> older values. But in some problems, it becomes important to track the
> time-series as observed at past dates. In order to address this
> problem, we need to store not just the time-series as a set of
> (date,value) pairs, but also an additional field "infodate" which is
> the date on which a given record was observed.
>
> Here's an example of this data representation:
>
>  a <- structure(list(date = c("2007-04-01", "2007-04-01", "2007-05-01",
>    "2007-04-01", "2007-05-01", "2007-06-01", "2007-05-01", "2007-06-01",
>    "2007-07-01"), infodate = structure(c(13634, 13665, 13665, 13695,
>    13695, 13695, 13726, 13726, 13726), class = "Date"), value = c(42L,
>    43L, 55L, 49L, 55L, 66L, 56L, 67L, 77L)), .Names = c("date",
>    "infodate", "value"), row.names = c(NA, -9L), class = "data.frame")
>  a
>  str(a)
>
> So this is a dataset containing date (a string), infodate (a Date) and value.
>
> Using this representation, I wrote a function which queries the
> dataset and reports the time series as seen on a given date. If a
> value for ondate is supplied, only records with infodate <= ondate are
> utilised.
>
>  fetch.ts <- function(d, ondate=NULL) {
>    if (!is.null(ondate)) {
>      d <- subset(d, d$infodate <= ondate)
>    }
>
>    # Now we walk through the series, and every time a new value for
>    # a given date shows up, we overwrite the previous version.
>    x <- a$value[1]; names(x)[1] <- d$date[1]
>    for (i in 2:nrow(d)) {
>      x[d$date[i]] <- d$value[i]
>    }
>    x
>  }
>
> This seems to work okay:
>
>  fetch.ts(a)
>  all.equal(fetch.ts(a), structure(c(49L, 56L, 67L, 77L),
>                                   .Names = c("2007-04-01",
>                                     "2007-05-01", "2007-06-01", "2007-07-01")))
>  fetch.ts(a, "2007-07-01")
>  all.equal(fetch.ts(a, "2007-07-01"),
>            structure(c(49L, 55L, 66L),
>                      .Names = c("2007-04-01", "2007-05-01", "2007-06-01")))
>
> but I'm not happy at my loops-intensive solution. Also, the use of
> associative arrays (using the names in R) might be quite
> expensive. How would you improve on this?
>
> --
> Ajay Shah                                      http://www.mayin.org/ajayshah
> ajayshah at mayin.org                             http://ajayshahblog.blogspot.com
> <*(:-? - wizard who doesn't know the answer.
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>



More information about the R-SIG-Finance mailing list