[R-SIG-Finance] efficient extraction of local extrema and zero-crossings in large multivariate zoo?

Christian Gunning icos.atropa at gmail.com
Thu Jun 25 12:37:53 CEST 2009


I have an multi-year hour-sampled multivariate zoo from which i'm
trying to extract index() and coredata() of daily max and
zero-crossing for each variable and use na.approx() to fill in
intervening values, so that the resulting zoo has the same dimensions
of the input zoo. Currently i'm looping over each column and day,
extracting a day's worth of data with window(), and using which.max()
to extract the "record".  For short zoos it works fine.  It doesn't
scale well, though - for example, 2e4 points takes ~4-6 times longer
than 1e4 points with 2 variables. Am i missing major bottlenecks or
vectorization potentials here?

find.extrema = function(myzoo) {
  days = as.POSIXct(levels(as.factor(as.Date(index(myzoo)))))
  ret = myzoo
  ret[] <- NA
  for (tcol in 1:dim(myzoo)[2]) {
    for (day in days) {
      this <- window(myzoo[,tcol], start=day, end=day+24*60*60-1)
      thismax <- this[which.max(this)]
      this[this<0] <- 0 ## remove negative values first
      thiszero <- this[which.min(this)]
      ret[index(ret) == index(thismax), tcol] <- coredata(thismax)
      ret[index(ret) == index(thiszero), tcol] <- coredata(thiszero)
      ### ret[index(thismax), tcol] <- coredata(thismax)  ### gives an error
    } # end days
  } # end tcol
  ret = na.approx(ret)
  return(ret)
}

hours=1e4 # about 2 year's worth of hours
tmp=(as.POSIXct('2001-01-01')+1:hours*60*60)
tmpz=zoo(cbind(a=sin(as.integer(tmp)/1e4), b=sin(as.integer(tmp)/1.1e4)), tmp)
system.time(tmpextrema <- find.extrema(tmpz))

thanks,
christian gunning
university of new mexico



More information about the R-SIG-Finance mailing list