[R-SIG-Finance] Vectorized rolling computation on xts series

Aleks Clark aleks.clark at gmail.com
Wed Oct 7 11:11:17 CEST 2009


Another approach would be to use zoo or xts's lag function to to
generate a dataframe or matrix with the current day's data and N
previous periods in a table. If your data is fairly univariate, this
shouldn't prove a problem, just do a little math and you can specify
easily how many days of data to "go back". You'd do something like
this:

starting with:

d1
d2
d3
d4
d5

use Lag (or lag, they behave differently), then t() and apply() and end up with:

d1 na na na
d2 d1 na na
d3 d2 d1 na
d4 d3 d2 d1
d5 d4 d3 d2

you can then easily run your computations in a vectored form using the
apply family of functions.

On Wed, Oct 7, 2009 at 3:30 AM, Mark Breman <breman.mark at gmail.com> wrote:
> Hi Shane,
> I had a look at these functions but they do not satisfy my constraints:
>
> - apply.monthly works with 'calendar months', but I need a function that
> allows me to specify for instance 1995-01-06 until 1995-02-06 (i.e.
> 'duration' of one month) for the computation of element x = 1995-02-06
>
> - rollapply (and also rollmax, rollmin) need a specification of the number
> of previous elements from the series if I understand it correctly. As you
> can see in the example it is daily data but with lots of gaps, so this would
> be very difficult to do if at all possible.
>
> Thanks for your quick response though,
>
> Kind regards,
>
> -Mark-
>
> 2009/10/7 Shane <shane.conway at gmail.com>
>
>> I think you want the apply.monthly function in xts. It also has other time
>> periods (eg daily).
>>
>> You may also want to look at rollapply in zoo.
>>
>> Sent from my iPhone
>>
>>
>> On Oct 7, 2009, at 4:05 AM, Mark Breman <breman.mark at gmail.com> wrote:
>>
>>  Hi,
>>> I have a univariate xts timeseries (daily data) for which I need to apply
>>> a
>>> computation for each element. The computation for element x needs the last
>>> y
>>> months of the data from the timeseries. What's more, I need a "vectorized"
>>> computation because looping over all elements is too slow (it's a large
>>> timeseries).
>>>
>>> I think this is what is called a "rolling" or "running" computation in R.
>>>
>>> The computation I need to do for element x is:
>>> - calculate the percentage of the value x within the range of values from
>>> the last y months, i.e. determine the min() and max() of the last y months
>>> of data (including x), and determine what percentage of this range the
>>> value
>>> x is. For example: min(last 1 months) == 10, max(last 1 months) == 50, x
>>> ==
>>> 20 would yield: 25%
>>> - elements for which y months of previous data (including x itself) is not
>>> available should become NaN or some other "special value".
>>>
>>> An example
>>> So let's say I have a timeseries called "data":
>>>
>>>  data
>>>>
>>>          NonCommNet
>>> 1995-01-03      44580
>>> 1995-01-04      44580
>>> 1995-01-05      44580
>>> 1995-01-06      44580
>>> 1995-01-09      44580
>>> 1995-01-10      32835
>>> 1995-01-11      32835
>>> 1995-01-12      32835
>>> 1995-01-13      32835
>>> 1995-01-16      32835
>>> 1995-01-17      38385
>>> 1995-01-18      38385
>>> 1995-01-19      38385
>>> 1995-01-20      38385
>>> 1995-01-23      38385
>>> 1995-01-24      19150
>>> 1995-01-25      19150
>>> 1995-01-26      19150
>>> 1995-01-27      19150
>>> 1995-01-30      19150
>>> 1995-01-31      15245
>>> 1995-02-01      15245
>>> 1995-02-02      15245
>>> 1995-02-03      15245
>>> 1995-02-06      15245
>>> 1995-02-07      24110
>>> 1995-02-08      24110
>>> 1995-02-09      24110
>>> 1995-02-10      24110
>>> 1995-02-13      24110
>>> 1995-02-14      17615
>>> 1995-02-15      17615
>>> 1995-02-16      17615
>>> 1995-02-17      17615
>>> 1995-02-21     -23080
>>> 1995-02-22     -23080
>>> 1995-02-23     -23080
>>> 1995-02-24     -23080
>>> 1995-02-27     -23080
>>> 1995-02-28     -17445
>>>
>>> I tried the following "vectorized" solution ( example with y = 1 month):
>>>
>>>> ((data - min(last(data, "1 months"))) / (max(last(data, "1 months")) -
>>>>
>>> min(last(data, "1 months")))) * 100
>>>          NonCommNet
>>> 1995-01-03  143.37783
>>> 1995-01-04  143.37783
>>> 1995-01-05  143.37783
>>> 1995-01-06  143.37783
>>> 1995-01-09  143.37783
>>> 1995-01-10  118.48909
>>> 1995-01-11  118.48909
>>> 1995-01-12  118.48909
>>> 1995-01-13  118.48909
>>> 1995-01-16  118.48909
>>> 1995-01-17  130.25005
>>> 1995-01-18  130.25005
>>> 1995-01-19  130.25005
>>> 1995-01-20  130.25005
>>> 1995-01-23  130.25005
>>> 1995-01-24   89.48930
>>> 1995-01-25   89.48930
>>> 1995-01-26   89.48930
>>> 1995-01-27   89.48930
>>> 1995-01-30   89.48930
>>> 1995-01-31   81.21424
>>> 1995-02-01   81.21424
>>> 1995-02-02   81.21424
>>> 1995-02-03   81.21424
>>> 1995-02-06   81.21424
>>> 1995-02-07  100.00000
>>> 1995-02-08  100.00000
>>> 1995-02-09  100.00000
>>> 1995-02-10  100.00000
>>> 1995-02-13  100.00000
>>> 1995-02-14   86.23649
>>> 1995-02-15   86.23649
>>> 1995-02-16   86.23649
>>> 1995-02-17   86.23649
>>> 1995-02-21    0.00000
>>> 1995-02-22    0.00000
>>> 1995-02-23    0.00000
>>> 1995-02-24    0.00000
>>> 1995-02-27    0.00000
>>> 1995-02-28   11.94109
>>>
>>> This does not satisfy my constraints because:
>>> 1) the first month of data should have become NaN or some other special
>>> value as there is not a full month of previous data available. I think
>>> this
>>> is caused by the last() function which simply returns the available data
>>> if
>>> the requested amount of data is greater than the available amount of data.
>>> 2) the results for the second month of data are wrong. For instance look
>>> at
>>> the result for 1995-02-06 which is 81.21424%. This should have been 0%.
>>> The
>>> last months min() is 15245 (from 1995-02-06), the max() is 44580 (from
>>> element 1995-01-06) so it should yield 0%.
>>>
>>>  From analyzing the results I get the impression that the last() function
>>>> is
>>>>
>>> not suited for a "vectorized" solution but I'm not really sure...
>>>
>>> I also had a look at runMin() and runMax() from the TTR package, but you
>>> can't specify a calendar range with these functions as you can with last()
>>> and first() from the xts package.
>>>
>>> Now my question is: am I doing something wrong here or do you know another
>>> vectorized function that satisfies my constraints?
>>>
>>> Kind regards,
>>>
>>> -Mark-
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>



-- 
Aleks Clark



More information about the R-SIG-Finance mailing list