[R-SIG-Finance] troubles with apply.daily

Ted Byers r.ted.byers at gmail.com
Mon Jan 30 06:30:04 CET 2012


I just managed to make a function that applies rollapply to the data passed
as an argument, and that apply.daily then applies to alpha that you saw me
construct.

It now occurs to me that by doing the interpolation as I do, to ensure I
have records for every minute, it will be producing an interpolation for the
weekends too.  Is there a way to tell apply.daily to apply the function
provided only to those days when the market was open?

> -----Original Message-----
> From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
> Sent: January-29-12 11:59 PM
> To: Ted Byers
> Cc: r-sig-finance at r-project.org
> Subject: Re: [R-SIG-Finance] troubles with apply.daily
> 
> I think I've got it now: consider the following:
> 
> x <- xts(1:500, Sys.Date() + 1:500)
> 
> apply.weekly(x, max)
> apply.weekly(x, str)
> 
> The error message is a little subtle: the problem isn't in
> apply.weekly() but rather in coredata, which means it's actually in the
function
> being used to construct the return value. Specifically, your problem comes
from
> the fact that str() is one of those rare functions called for its
side-effects not its
> return value. str() actually always returns NULL invisibly (the printed
stuff is a
> side
> effect) and it's impossible to have a NULL (or actually lots of them) in
the
> values for an xts object. I believe if you run apply.weekly with a "real
function"
> like max or Hi() or mean() or whatnot, it should work fine. Can you test
this?
> 
> I don't know if this explains your problems with rollapply as you don't
define
> what your function is in the code you share.
> 
> Does that work?
> 
> Michael
> 
> On Sun, Jan 29, 2012 at 11:50 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
> > Hi Michael,
> >
> > Thanks for your help.
> >
> > Yes, your example works fine for me.
> >
> > Example data is a problem because of the quantity of data needed to
> > produce the problem.  I have, for example, a file of a few hundred
> > kilobyes, for tick data for less than a day, and the problem does not
> materialize with it.
> > I have another file of many tens of megabytes on which it invariably
> > happens.  I don't know how large a data file is needed to produce he
> > problem.  I really don't think you want me sending such a large
dataset...
> >
> > I have been investigating, and have progressed to a state where I am
> > not mixing types of time objects.
> >
> > One of the curiousities is that it does not affect the processing of
> > relatively tiny files, but I see it only on larger files (representing
> > 3 months of data)
> >
> > Now the steps I use to prepare my data are:
> >
> > x = read.table("quotes_M11.dat", header = FALSE, sep="\t", skip=0)
> > str(x)
> >
> > dt<-sprintf("%s %04d",x$V2,x$V4)
> > dt<-as.POSIXlt(dt,format="%Y-%m-%d %H%M") dt <- as.POSIXct(dt) y <-
> > data.frame(dt,x$V5)
> > colnames(y) <- c("tickdate","price")
> > z <- xts(y[,2],y[,1])
> > #alpha <- to.minutes3(z, OHLC=TRUE)
> > alpha <- to.minutes(z, OHLC=TRUE, drop.time=FALSE)
> > colnames(alpha) <- c("Open","High","Low","Close") #tseq <-
> > seq(start(alpha),end(alpha), by = 60) tseq <-
> > seq(start(alpha),end(alpha), by = "min") alpha <- na.approx(alpha,
> > xout = tseq)
> > head(alpha)
> > tail(alpha)
> >
> > The latter two calls produce:
> >
> >> head(alpha)
> >                        Open     High      Low   Close
> > 2011-03-10 00:00:00 10350.00 10365.00 10350.00 10360.0
> > 2011-03-10 00:01:00 10353.33 10363.33 10353.33 10360.0
> > 2011-03-10 00:02:00 10356.67 10361.67 10356.67 10360.0
> > 2011-03-10 00:03:00 10360.00 10360.00 10360.00 10360.0
> > 2011-03-10 00:04:00 10360.00 10360.00 10360.00 10360.0
> > 2011-03-10 00:05:00 10361.50 10361.50 10361.50 10361.5
> >> tail(alpha)
> >                    Open High  Low Close
> > 2011-06-08 23:51:00 9430 9430 9430  9430
> > 2011-06-08 23:52:00 9430 9430 9430  9430
> > 2011-06-08 23:53:00 9430 9430 9430  9430
> > 2011-06-08 23:54:00 9430 9430 9430  9430
> > 2011-06-08 23:55:00 9430 9430 9430  9430
> > 2011-06-08 23:56:00 9430 9430 9430  9430
> >
> >
> > Aas you can see, the data look fine.
> >
> > Alas, it seems neither rollapply nor apply.daily like it, yet.
> >
> >> tr <- rollapply(alpha,width=20,FUN=rollRegFun,by.column=FALSE,
> > align="right")
> > Error in if (b < 1e-07) { : missing value where TRUE/FALSE needed
> >
> > And the following:
> >
> > myfun <- function(d) {
> >  str(d)
> > }
> > apply.daily(alpha,myfun)
> >
> > Produces output right to the last day, but then dies (the output from
> > the last day processed successfully and the error):
> >
> > An ‘xts’ object from 2011-06-08 to 2011-06-08 23:56:00 containing:
> >  Data: num [1:1437, 1:4] 9435 9435 9440 9445 9450 ...
> >  - attr(*, "dimnames")=List of 2
> >  ..$ : NULL
> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
> >  xts Attributes:
> >  NULL
> > Error in coredata.xts(x) : currently unsupported data type
> >>
> >
> > As you can see, all of the data from the last day for which there was
> > data in the file was processed (in this case by str()), but then it
> > looks like
> > apply.data() tries to apply myfun on data for the day after the last
> > day for which there is data, and not surprisingly doesn't find any.  I
> > gues, the questin becomes why does apply.daily try to go past the last
> > date in the data?
> >
> > I do hope that the problem I am seeing in rollapply and that I see
> > when using apply.daily are related, as that would mean that I fix one
> > and the other gets fixed.
> >
> > Any other ideas?
> >
> > Thanks
> >
> > Ted
> >
> >> -----Original Message-----
> >> From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
> >> Sent: January-29-12 11:02 PM
> >> To: Ted Byers
> >> Cc: r-sig-finance at r-project.org
> >> Subject: Re: [R-SIG-Finance] troubles with apply.daily
> >>
> >> I'm not sure time() is very good for what you want to do. It's tied
> >> to R's
> > builtin ts
> >> class, which, and it pains me to say this about R, really isn't very
> >> good
> > (at least
> >> for finance-y things). I think all your problems come from that...
> >>
> >> Perhaps construct your new index sequence as:
> >>
> >> seq(start(alpha), end(alpha), by = "min")
> >>
> >> Since you didn't supply example data, let's try this (admittedly
> >> absurd) analysis to showcase how these techniques should work:
> >>
> >> library(quantmod)
> >> getSymbols("AAPL")
> >> AAPL <- Cl(AAPL)
> >>
> >> ## Force to have daily (including non-trading days) points AAPL <-
> >> na.approx(AAPL, xout = seq(start(AAPL), end(AAPL), by = "day"))
> >>
> >> # check that it worked
> >> head(AAPL, 20)
> >>
> >> # Now we aggregate to weekly
> >> AAPL.w <- to.weekly(AAPL)
> >>
> >> # and now we apply a function monthly apply.monthly(AAPL.w, max)
> >>
> >> So everything seems to be in order. Does this help?
> >>
> >> Michael
> >>
> >>
> >> On Sun, Jan 29, 2012 at 12:51 PM, Ted Byers <r.ted.byers at gmail.com>
> wrote:
> >> > I do not understand this, either to figure out the cause, let alone
> >> > the
> > fix.
> >> >
> >> >
> >> >
> >> > Here is what I tried:
> >> >
> >> >
> >> >
> >> > myfun <- function(d) {
> >> >
> >> >  str(d)
> >> >
> >> > }
> >> >
> >> > apply.daily(alpha,myfun)
> >> >
> >> >
> >> >
> >> > And here are what the beginning and end of alpha (an xts object
> >> > created by
> >> > to.minute()):
> >> >
> >> >
> >> >
> >> >> head(alpha)
> >> >
> >> >                        Open     High      Low Close
> >> >
> >> > 2011-03-10 00:00:00 10350.00 10365.00 10350.00 10360
> >> >
> >> > 2011-03-10 00:00:01 10350.06 10364.97 10350.06 10360
> >> >
> >> > 2011-03-10 00:00:02 10350.11 10364.94 10350.11 10360
> >> >
> >> > 2011-03-10 00:00:03 10350.17 10364.92 10350.17 10360
> >> >
> >> > 2011-03-10 00:00:04 10350.22 10364.89 10350.22 10360
> >> >
> >> > 2011-03-10 00:00:05 10350.28 10364.86 10350.28 10360
> >> >
> >> >> tail(alpha)
> >> >
> >> >                    Open High  Low Close
> >> >
> >> > 2011-06-08 23:55:55 9430 9430 9430  9430
> >> >
> >> > 2011-06-08 23:55:56 9430 9430 9430  9430
> >> >
> >> > 2011-06-08 23:55:57 9430 9430 9430  9430
> >> >
> >> > 2011-06-08 23:55:58 9430 9430 9430  9430
> >> >
> >> > 2011-06-08 23:55:59 9430 9430 9430  9430
> >> >
> >> > 2011-06-08 23:56:00 9430 9430 9430  9430
> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > There is almost three months of tick data here, converted to one
> >> > minute OHLC data.
> >> >
> >> >
> >> >
> >> > I had apparently successfully used the following to ensure I had an
> >> > even time series with values for every minute from start to end:
> >> >
> >> >
> >> >
> >> > tseq <- seq(start(alpha),end(alpha), by = time("00:01:00"))
> >> >
> >> > alpha <- na.approx(alpha, xout = tseq)
> >> >
> >> >
> >> >
> >> > But there is something weird here.  How is it that alpha appears to
> >> > have rows for every second from the start to the end, rather than
> >> > 'just' for every minute?
> >> >
> >> >
> >> >
> >> > Now here is what the output looks like:
> >> >
> >> >
> >> >
> >> > An 'xts' object from 2011-06-05 to 2011-06-05 23:59:59 containing:
> >> >
> >> >  Data: num [1:86400, 1:4] 9437 9437 9437 9437 9437 ...
> >> >
> >> > - attr(*, "dimnames")=List of 2
> >> >
> >> >  ..$ : NULL
> >> >
> >> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
> >> >
> >> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
> >> >
> >> >  xts Attributes:
> >> >
> >> >  NULL
> >> >
> >> > An 'xts' object from 2011-06-06 to 2011-06-06 23:59:59 containing:
> >> >
> >> >  Data: num [1:86400, 1:4] 9420 9420 9420 9420 9420 ...
> >> >
> >> > - attr(*, "dimnames")=List of 2
> >> >
> >> >  ..$ : NULL
> >> >
> >> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
> >> >
> >> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
> >> >
> >> >  xts Attributes:
> >> >
> >> >  NULL
> >> >
> >> > An 'xts' object from 2011-06-07 to 2011-06-07 23:59:59 containing:
> >> >
> >> >  Data: num [1:86400, 1:4] 9428 9428 9428 9428 9428 ...
> >> >
> >> > - attr(*, "dimnames")=List of 2
> >> >
> >> >  ..$ : NULL
> >> >
> >> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
> >> >
> >> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
> >> >
> >> >  xts Attributes:
> >> >
> >> >  NULL
> >> >
> >> > An 'xts' object from 2011-06-08 to 2011-06-08 23:56:00 containing:
> >> >
> >> >  Data: num [1:86161, 1:4] 9435 9435 9435 9435 9435 ...
> >> >
> >> > - attr(*, "dimnames")=List of 2
> >> >
> >> >  ..$ : NULL
> >> >
> >> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
> >> >
> >> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
> >> >
> >> >  xts Attributes:
> >> >
> >> >  NULL
> >> >
> >> > Error in coredata.xts(x) : currently unsupported data type
> >> >
> >> >
> >> >
> >> > Now, I do not understand what is happening here.  The data seem
> >> > consistent throughout, so why would it crash and burn on the very
> >> > last day, and only on that day, of the three months of data
> >> >
> >> >
> >> >
> >> > Any insight would be greatly appreciated.
> >> >
> >> >
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >> > Ted
> >> >
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > R-SIG-Finance at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> >> > -- Subscriber-posting only. If you want to post, subscribe first.
> >> > -- Also note that this is not the r-help list where general R
> >> > questions
> > should
> >> go.
> >



More information about the R-SIG-Finance mailing list