[R-SIG-Finance] Vectorized rolling computation on xts series
Sandor Benczik
sandor.benczik at crabel.ro
Wed Oct 7 11:28:39 CEST 2009
| -----Original Message-----
| From: r-sig-finance-bounces at stat.math.ethz.ch [mailto:r-sig-finance-
| bounces at stat.math.ethz.ch] On Behalf Of Mark Breman
| The computation I need to do for element x is:
| - calculate the percentage of the value x within the range of values
from
| the last y months, i.e. determine the min() and max() of the last y
months
| of data (including x), and determine what percentage of this range the
| value
| x is. For example: min(last 1 months) == 10, max(last 1 months) == 50,
x
| ==
| 20 would yield: 25%
| - elements for which y months of previous data (including x itself) is
not
| available should become NaN or some other "special value".
| I tried the following "vectorized" solution ( example with y = 1
month):
| > ((data - min(last(data, "1 months"))) / (max(last(data, "1 months"))
-
| min(last(data, "1 months")))) * 100
| This does not satisfy my constraints because:
| 1) the first month of data should have become NaN or some other
special
| value as there is not a full month of previous data available. I think
| this
| is caused by the last() function which simply returns the available
data
| if
| the requested amount of data is greater than the available amount of
data.
As you said, your data has frequent gaps, so you will never have a full
month of previous data. Does that mean that you want a time-series full
of NaN's? You should be careful how do you define a 'full month'.
| 2) the results for the second month of data are wrong.
| >From analyzing the results I get the impression that the last()
function
| is
| not suited for a "vectorized" solution but I'm not really sure...
In your code you are not applying a 'vectorized' last. You are taking
the last one month of the whole time-series, which is the reason for the
strange results.
A couple of ideas: build an empty (0-column) xts with all dates
(including those not in your series), and merge it with your series, and
then you can apply zoo's rollmax & rollmin on a constant 30 or 31 day
lookback window. (Both commands are fast, you may want to lookup na.locf
for zoo or na.rm argument for max/min to deal with the extra dates.)
Even simpler: assume 21 business days per month and do a rollmin/max on
a window of 21. (Is it that big a problem if you are 1-2 days off?)
If neither works for you and speed is important, this might be a good
candidate for C code.
HTH,
Sandor
More information about the R-SIG-Finance
mailing list