[R-SIG-Finance] aligning time series data

Gabor Grothendieck ggrothendieck at gmail.com
Sun Sep 12 23:40:47 CEST 2010


On Sun, Sep 12, 2010 at 3:45 PM, Anil Vijendran
<anil.vijendran at gmail.com> wrote:
> Hi, I have a data frame that looks like this:
>
>> print(tmp.2, row.names=F)
>       Date        A       B        C        D
>  2009-06-30 437.4941 1498.94 3013.123 7771.071
>  2009-07-31 445.5752 1612.31 3182.020 7800.285
>  2009-08-31 460.4486 1670.52 3230.984 7900.123
>  2009-09-30 476.5982 1732.86 3438.276 7980.975
>  2009-10-31 490.8214 1700.67 3205.386 7937.876
>  2009-11-30 491.5338 1802.68 3329.164 8091.661
>  2009-12-31 492.6630 1837.50 3257.927 7701.011
>  2010-01-29 495.9595      NA       NA       NA
>  2010-01-31 495.9595 1771.40 3280.050 7865.038
>  2010-02-26 493.2970      NA       NA       NA
>  2010-02-28 493.2970 1826.27 3373.147 7901.680
>  2010-03-31 508.8930 1936.48 3236.337 7778.707
>  2010-04-30 515.2000 1967.05 3356.522 7899.239
>  2010-05-28 507.9438      NA       NA       NA
>  2010-05-30       NA      NA 3446.907       NA
>  2010-05-31 507.9438 1809.98       NA 8172.843
>  2010-06-30 513.0341 1715.23 3454.708 8426.962
>  2010-07-30 529.3736      NA       NA       NA
>  2010-07-31       NA 1835.40 3568.250 8468.670
>  2010-08-31 553.9723 1752.55       NA 8824.940
>>
>
> Can someone recommend a way to align all these series on the same date (last
> day of month) - i.e for each series pick the value as of the last day of the
> month if its non-NA, else use the previous value for the last day of the
> month? I was planning to use timeSeries, but solutions that use zoo or xts
> will be great as well.
>
> I could build something myself but am hoping there is a robust solution
> already available to solve a common problem like this.
>

Try this:

library(zoo)
z <- read.zoo(DF)
na.locf(z)[!duplicated(as.yearmon(time(z)), fromLast = TRUE)]

That is:

> DF <-  # input data frame
+ structure(list(Date = structure(c(14425, 14456, 14487, 14517,
+ 14548, 14578, 14609, 14638, 14640, 14666, 14668, 14699, 14729,
+ 14757, 14759, 14760, 14790, 14820, 14821, 14852), class = "Date"),
+     A = c(437.4941, 445.5752, 460.4486, 476.5982, 490.8214, 491.5338,
+     492.663, 495.9595, 495.9595, 493.297, 493.297, 508.893, 515.2,
+     507.9438, NA, 507.9438, 513.0341, 529.3736, NA, 553.9723),
+     B = c(1498.94, 1612.31, 1670.52, 1732.86, 1700.67, 1802.68,
+     1837.5, NA, 1771.4, NA, 1826.27, 1936.48, 1967.05, NA, NA,
+     1809.98, 1715.23, NA, 1835.4, 1752.55), C = c(3013.123, 3182.02,
+     3230.984, 3438.276, 3205.386, 3329.164, 3257.927, NA, 3280.05,
+     NA, 3373.147, 3236.337, 3356.522, NA, 3446.907, NA, 3454.708,
+     NA, 3568.25, NA), D = c(7771.071, 7800.285, 7900.123, 7980.975,
+     7937.876, 8091.661, 7701.011, NA, 7865.038, NA, 7901.68,
+     7778.707, 7899.239, NA, NA, 8172.843, 8426.962, NA, 8468.67,
+     8824.94)), .Names = c("Date", "A", "B", "C", "D"),
+     row.names = c(NA, -20L), class = "data.frame")
>
> library(zoo)
> z <- read.zoo(DF) # convert to zoo
> na.locf(z)[!duplicated(as.yearmon(time(z)), fromLast = TRUE)]
                  A       B        C        D
2009-06-30 437.4941 1498.94 3013.123 7771.071
2009-07-31 445.5752 1612.31 3182.020 7800.285
2009-08-31 460.4486 1670.52 3230.984 7900.123
2009-09-30 476.5982 1732.86 3438.276 7980.975
2009-10-31 490.8214 1700.67 3205.386 7937.876
2009-11-30 491.5338 1802.68 3329.164 8091.661
2009-12-31 492.6630 1837.50 3257.927 7701.011
2010-01-31 495.9595 1771.40 3280.050 7865.038
2010-02-28 493.2970 1826.27 3373.147 7901.680
2010-03-31 508.8930 1936.48 3236.337 7778.707
2010-04-30 515.2000 1967.05 3356.522 7899.239
2010-05-31 507.9438 1809.98 3446.907 8172.843
2010-06-30 513.0341 1715.23 3454.708 8426.962
2010-07-31 529.3736 1835.40 3568.250 8468.670
2010-08-31 553.9723 1752.55 3568.250 8824.940


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-SIG-Finance mailing list