[R-SIG-Finance] Vectorized rolling computation on xts series

Joshua Ulrich josh.m.ulrich at gmail.com
Thu Oct 8 03:57:03 CEST 2009


On Wed, Oct 7, 2009 at 3:05 AM, Mark Breman <breman.mark at gmail.com> wrote:
> Hi,
> I have a univariate xts timeseries (daily data) for which I need to apply a
> computation for each element. The computation for element x needs the last y
> months of the data from the timeseries. What's more, I need a "vectorized"
> computation because looping over all elements is too slow (it's a large
> timeseries).
>
> I think this is what is called a "rolling" or "running" computation in R.
>
> The computation I need to do for element x is:
> - calculate the percentage of the value x within the range of values from
> the last y months, i.e. determine the min() and max() of the last y months
> of data (including x), and determine what percentage of this range the value
> x is. For example: min(last 1 months) == 10, max(last 1 months) == 50, x ==
> 20 would yield: 25%
> - elements for which y months of previous data (including x itself) is not
> available should become NaN or some other "special value".
>
> An example
> So let's say I have a timeseries called "data":
>
>> data
>           NonCommNet
> 1995-01-03      44580
> 1995-01-04      44580
> 1995-01-05      44580
> 1995-01-06      44580
> 1995-01-09      44580
> 1995-01-10      32835
> 1995-01-11      32835
> 1995-01-12      32835
> 1995-01-13      32835
> 1995-01-16      32835
> 1995-01-17      38385
> 1995-01-18      38385
> 1995-01-19      38385
> 1995-01-20      38385
> 1995-01-23      38385
> 1995-01-24      19150
> 1995-01-25      19150
> 1995-01-26      19150
> 1995-01-27      19150
> 1995-01-30      19150
> 1995-01-31      15245
> 1995-02-01      15245
> 1995-02-02      15245
> 1995-02-03      15245
> 1995-02-06      15245
> 1995-02-07      24110
> 1995-02-08      24110
> 1995-02-09      24110
> 1995-02-10      24110
> 1995-02-13      24110
> 1995-02-14      17615
> 1995-02-15      17615
> 1995-02-16      17615
> 1995-02-17      17615
> 1995-02-21     -23080
> 1995-02-22     -23080
> 1995-02-23     -23080
> 1995-02-24     -23080
> 1995-02-27     -23080
> 1995-02-28     -17445
>
> I tried the following "vectorized" solution ( example with y = 1 month):
>> ((data - min(last(data, "1 months"))) / (max(last(data, "1 months")) -
> min(last(data, "1 months")))) * 100
>           NonCommNet
> 1995-01-03  143.37783
> 1995-01-04  143.37783
> 1995-01-05  143.37783
> 1995-01-06  143.37783
> 1995-01-09  143.37783
> 1995-01-10  118.48909
> 1995-01-11  118.48909
> 1995-01-12  118.48909
> 1995-01-13  118.48909
> 1995-01-16  118.48909
> 1995-01-17  130.25005
> 1995-01-18  130.25005
> 1995-01-19  130.25005
> 1995-01-20  130.25005
> 1995-01-23  130.25005
> 1995-01-24   89.48930
> 1995-01-25   89.48930
> 1995-01-26   89.48930
> 1995-01-27   89.48930
> 1995-01-30   89.48930
> 1995-01-31   81.21424
> 1995-02-01   81.21424
> 1995-02-02   81.21424
> 1995-02-03   81.21424
> 1995-02-06   81.21424
> 1995-02-07  100.00000
> 1995-02-08  100.00000
> 1995-02-09  100.00000
> 1995-02-10  100.00000
> 1995-02-13  100.00000
> 1995-02-14   86.23649
> 1995-02-15   86.23649
> 1995-02-16   86.23649
> 1995-02-17   86.23649
> 1995-02-21    0.00000
> 1995-02-22    0.00000
> 1995-02-23    0.00000
> 1995-02-24    0.00000
> 1995-02-27    0.00000
> 1995-02-28   11.94109
>
> This does not satisfy my constraints because:
> 1) the first month of data should have become NaN or some other special
> value as there is not a full month of previous data available. I think this
> is caused by the last() function which simply returns the available data if
> the requested amount of data is greater than the available amount of data.
> 2) the results for the second month of data are wrong. For instance look at
> the result for 1995-02-06 which is 81.21424%. This should have been 0%. The
> last months min() is 15245 (from 1995-02-06), the max() is 44580 (from
> element 1995-01-06) so it should yield 0%.
>
> >From analyzing the results I get the impression that the last() function is
> not suited for a "vectorized" solution but I'm not really sure...
>
> I also had a look at runMin() and runMax() from the TTR package, but you
> can't specify a calendar range with these functions as you can with last()
> and first() from the xts package.
>
> Now my question is: am I doing something wrong here or do you know another
> vectorized function that satisfies my constraints?
>
> Kind regards,
>
> -Mark-
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>

Hi Mark,

I don't think there's currently a "vectorized" way to do what you
want.  That said, I've started working on rolling analysis functions
in xts, so this type functionality may make its way into the package
at some point.

I think the function below is close to what you want.  It would be
faster to subset the xts object by numeric values rather than strings,
but I'm not sure how to do that given your constraints...

pctRank <- function(x, n=1, freq='months') {
  x <- try.xts(x, error=stop("'x' must be coercible to xts"))
  if(NCOL(x) != 1) stop("'x' must be univariate")

  seqPos <- paste(n," ",freq,sep="")
  seqNeg <- paste("-",n," ",freq,sep="")

  naRange <- seq(index(first(x)),by=seqPos,length.out=2)
  naRangeStr <- paste(naRange[1],naRange[2]-1,sep='/')

  res <- sapply(1:NROW(x),
    function(i) {
      seq_i <- rev(seq(index(x[i]),by=seqNeg,length.out=2))
      rng_i <- range(x[paste(seq_i,collapse="/")])
      res_i <- (x[i]-rng_i[1])/(rng_i[2]-rng_i[1])
      return(res_i)
    })
  res <- xts(res, index(x))
  res[naRangeStr] <- rep(NA,NROW(res[naRangeStr]))
  return(res)
}

> (out <- pctRank(data))
                [,1]
1995-01-03        NA
1995-01-04        NA
1995-01-05        NA
1995-01-06        NA
1995-01-09        NA
1995-01-10        NA
1995-01-11        NA
1995-01-12        NA
1995-01-13        NA
1995-01-16        NA
1995-01-17        NA
1995-01-18        NA
1995-01-19        NA
1995-01-20        NA
1995-01-23        NA
1995-01-24        NA
1995-01-25        NA
1995-01-26        NA
1995-01-27        NA
1995-01-30        NA
1995-01-31        NA
1995-02-01        NA
1995-02-02        NA
1995-02-03 0.0000000
1995-02-06 0.0000000
1995-02-07 0.3021987
1995-02-08 0.3021987
1995-02-09 0.3021987
1995-02-10 0.3831029
1995-02-13 0.3831029
1995-02-14 0.1024201
1995-02-15 0.1024201
1995-02-16 0.1024201
1995-02-17 0.1024201
1995-02-21 0.0000000
1995-02-22 0.0000000
1995-02-23 0.0000000
1995-02-24 0.0000000
1995-02-27 0.0000000
1995-02-28 0.1194109

HTH,
Josh
--
http://www.fosstrading.com



More information about the R-SIG-Finance mailing list