[R-SIG-Finance] troubles with apply.daily

R. Michael Weylandt michael.weylandt at gmail.com
Mon Jan 30 05:59:00 CET 2012


I think I've got it now: consider the following:

x <- xts(1:500, Sys.Date() + 1:500)

apply.weekly(x, max)
apply.weekly(x, str)

The error message is a little subtle: the problem isn't in
apply.weekly() but rather in coredata, which means it's actually in
the function being used to construct the return value. Specifically,
your problem comes from the fact that str() is one of those rare
functions called for its side-effects not its return value. str()
actually always returns NULL invisibly (the printed stuff is a side
effect) and it's impossible to have a NULL (or actually lots of them)
in the values for an xts object. I believe if you run apply.weekly
with a "real function" like max or Hi() or mean() or whatnot, it
should work fine. Can you test this?

I don't know if this explains your problems with rollapply as you
don't define what your function is in the code you share.

Does that work?

Michael

On Sun, Jan 29, 2012 at 11:50 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
> Hi Michael,
>
> Thanks for your help.
>
> Yes, your example works fine for me.
>
> Example data is a problem because of the quantity of data needed to produce
> the problem.  I have, for example, a file of a few hundred kilobyes, for
> tick data for less than a day, and the problem does not materialize with it.
> I have another file of many tens of megabytes on which it invariably
> happens.  I don't know how large a data file is needed to produce he
> problem.  I really don't think you want me sending such a large dataset...
>
> I have been investigating, and have progressed to a state where I am not
> mixing types of time objects.
>
> One of the curiousities is that it does not affect the processing of
> relatively tiny files, but I see it only on larger files (representing 3
> months of data)
>
> Now the steps I use to prepare my data are:
>
> x = read.table("quotes_M11.dat", header = FALSE, sep="\t", skip=0)
> str(x)
>
> dt<-sprintf("%s %04d",x$V2,x$V4)
> dt<-as.POSIXlt(dt,format="%Y-%m-%d %H%M")
> dt <- as.POSIXct(dt)
> y <- data.frame(dt,x$V5)
> colnames(y) <- c("tickdate","price")
> z <- xts(y[,2],y[,1])
> #alpha <- to.minutes3(z, OHLC=TRUE)
> alpha <- to.minutes(z, OHLC=TRUE, drop.time=FALSE)
> colnames(alpha) <- c("Open","High","Low","Close")
> #tseq <- seq(start(alpha),end(alpha), by = 60)
> tseq <- seq(start(alpha),end(alpha), by = "min")
> alpha <- na.approx(alpha, xout = tseq)
> head(alpha)
> tail(alpha)
>
> The latter two calls produce:
>
>> head(alpha)
>                        Open     High      Low   Close
> 2011-03-10 00:00:00 10350.00 10365.00 10350.00 10360.0
> 2011-03-10 00:01:00 10353.33 10363.33 10353.33 10360.0
> 2011-03-10 00:02:00 10356.67 10361.67 10356.67 10360.0
> 2011-03-10 00:03:00 10360.00 10360.00 10360.00 10360.0
> 2011-03-10 00:04:00 10360.00 10360.00 10360.00 10360.0
> 2011-03-10 00:05:00 10361.50 10361.50 10361.50 10361.5
>> tail(alpha)
>                    Open High  Low Close
> 2011-06-08 23:51:00 9430 9430 9430  9430
> 2011-06-08 23:52:00 9430 9430 9430  9430
> 2011-06-08 23:53:00 9430 9430 9430  9430
> 2011-06-08 23:54:00 9430 9430 9430  9430
> 2011-06-08 23:55:00 9430 9430 9430  9430
> 2011-06-08 23:56:00 9430 9430 9430  9430
>
>
> Aas you can see, the data look fine.
>
> Alas, it seems neither rollapply nor apply.daily like it, yet.
>
>> tr <- rollapply(alpha,width=20,FUN=rollRegFun,by.column=FALSE,
> align="right")
> Error in if (b < 1e-07) { : missing value where TRUE/FALSE needed
>
> And the following:
>
> myfun <- function(d) {
>  str(d)
> }
> apply.daily(alpha,myfun)
>
> Produces output right to the last day, but then dies (the output from the
> last day processed successfully and the error):
>
> An ‘xts’ object from 2011-06-08 to 2011-06-08 23:56:00 containing:
>  Data: num [1:1437, 1:4] 9435 9435 9440 9445 9450 ...
>  - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : chr [1:4] "Open" "High" "Low" "Close"
>  Indexed by objects of class: [POSIXct,POSIXt] TZ:
>  xts Attributes:
>  NULL
> Error in coredata.xts(x) : currently unsupported data type
>>
>
> As you can see, all of the data from the last day for which there was data
> in the file was processed (in this case by str()), but then it looks like
> apply.data() tries to apply myfun on data for the day after the last day for
> which there is data, and not surprisingly doesn't find any.  I gues, the
> questin becomes why does apply.daily try to go past the last date in the
> data?
>
> I do hope that the problem I am seeing in rollapply and that I see when
> using apply.daily are related, as that would mean that I fix one and the
> other gets fixed.
>
> Any other ideas?
>
> Thanks
>
> Ted
>
>> -----Original Message-----
>> From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
>> Sent: January-29-12 11:02 PM
>> To: Ted Byers
>> Cc: r-sig-finance at r-project.org
>> Subject: Re: [R-SIG-Finance] troubles with apply.daily
>>
>> I'm not sure time() is very good for what you want to do. It's tied to R's
> builtin ts
>> class, which, and it pains me to say this about R, really isn't very good
> (at least
>> for finance-y things). I think all your problems come from that...
>>
>> Perhaps construct your new index sequence as:
>>
>> seq(start(alpha), end(alpha), by = "min")
>>
>> Since you didn't supply example data, let's try this (admittedly
>> absurd) analysis to showcase how these techniques should work:
>>
>> library(quantmod)
>> getSymbols("AAPL")
>> AAPL <- Cl(AAPL)
>>
>> ## Force to have daily (including non-trading days) points AAPL <-
>> na.approx(AAPL, xout = seq(start(AAPL), end(AAPL), by = "day"))
>>
>> # check that it worked
>> head(AAPL, 20)
>>
>> # Now we aggregate to weekly
>> AAPL.w <- to.weekly(AAPL)
>>
>> # and now we apply a function monthly
>> apply.monthly(AAPL.w, max)
>>
>> So everything seems to be in order. Does this help?
>>
>> Michael
>>
>>
>> On Sun, Jan 29, 2012 at 12:51 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
>> > I do not understand this, either to figure out the cause, let alone the
> fix.
>> >
>> >
>> >
>> > Here is what I tried:
>> >
>> >
>> >
>> > myfun <- function(d) {
>> >
>> >  str(d)
>> >
>> > }
>> >
>> > apply.daily(alpha,myfun)
>> >
>> >
>> >
>> > And here are what the beginning and end of alpha (an xts object
>> > created by
>> > to.minute()):
>> >
>> >
>> >
>> >> head(alpha)
>> >
>> >                        Open     High      Low Close
>> >
>> > 2011-03-10 00:00:00 10350.00 10365.00 10350.00 10360
>> >
>> > 2011-03-10 00:00:01 10350.06 10364.97 10350.06 10360
>> >
>> > 2011-03-10 00:00:02 10350.11 10364.94 10350.11 10360
>> >
>> > 2011-03-10 00:00:03 10350.17 10364.92 10350.17 10360
>> >
>> > 2011-03-10 00:00:04 10350.22 10364.89 10350.22 10360
>> >
>> > 2011-03-10 00:00:05 10350.28 10364.86 10350.28 10360
>> >
>> >> tail(alpha)
>> >
>> >                    Open High  Low Close
>> >
>> > 2011-06-08 23:55:55 9430 9430 9430  9430
>> >
>> > 2011-06-08 23:55:56 9430 9430 9430  9430
>> >
>> > 2011-06-08 23:55:57 9430 9430 9430  9430
>> >
>> > 2011-06-08 23:55:58 9430 9430 9430  9430
>> >
>> > 2011-06-08 23:55:59 9430 9430 9430  9430
>> >
>> > 2011-06-08 23:56:00 9430 9430 9430  9430
>> >
>> >>
>> >
>> >
>> >
>> > There is almost three months of tick data here, converted to one
>> > minute OHLC data.
>> >
>> >
>> >
>> > I had apparently successfully used the following to ensure I had an
>> > even time series with values for every minute from start to end:
>> >
>> >
>> >
>> > tseq <- seq(start(alpha),end(alpha), by = time("00:01:00"))
>> >
>> > alpha <- na.approx(alpha, xout = tseq)
>> >
>> >
>> >
>> > But there is something weird here.  How is it that alpha appears to
>> > have rows for every second from the start to the end, rather than
>> > 'just' for every minute?
>> >
>> >
>> >
>> > Now here is what the output looks like:
>> >
>> >
>> >
>> > An 'xts' object from 2011-06-05 to 2011-06-05 23:59:59 containing:
>> >
>> >  Data: num [1:86400, 1:4] 9437 9437 9437 9437 9437 ...
>> >
>> > - attr(*, "dimnames")=List of 2
>> >
>> >  ..$ : NULL
>> >
>> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
>> >
>> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
>> >
>> >  xts Attributes:
>> >
>> >  NULL
>> >
>> > An 'xts' object from 2011-06-06 to 2011-06-06 23:59:59 containing:
>> >
>> >  Data: num [1:86400, 1:4] 9420 9420 9420 9420 9420 ...
>> >
>> > - attr(*, "dimnames")=List of 2
>> >
>> >  ..$ : NULL
>> >
>> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
>> >
>> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
>> >
>> >  xts Attributes:
>> >
>> >  NULL
>> >
>> > An 'xts' object from 2011-06-07 to 2011-06-07 23:59:59 containing:
>> >
>> >  Data: num [1:86400, 1:4] 9428 9428 9428 9428 9428 ...
>> >
>> > - attr(*, "dimnames")=List of 2
>> >
>> >  ..$ : NULL
>> >
>> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
>> >
>> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
>> >
>> >  xts Attributes:
>> >
>> >  NULL
>> >
>> > An 'xts' object from 2011-06-08 to 2011-06-08 23:56:00 containing:
>> >
>> >  Data: num [1:86161, 1:4] 9435 9435 9435 9435 9435 ...
>> >
>> > - attr(*, "dimnames")=List of 2
>> >
>> >  ..$ : NULL
>> >
>> >  ..$ : chr [1:4] "Open" "High" "Low" "Close"
>> >
>> >  Indexed by objects of class: [POSIXct,POSIXt] TZ:
>> >
>> >  xts Attributes:
>> >
>> >  NULL
>> >
>> > Error in coredata.xts(x) : currently unsupported data type
>> >
>> >
>> >
>> > Now, I do not understand what is happening here.  The data seem
>> > consistent throughout, so why would it crash and burn on the very last
>> > day, and only on that day, of the three months of data
>> >
>> >
>> >
>> > Any insight would be greatly appreciated.
>> >
>> >
>> >
>> > Thanks
>> >
>> >
>> >
>> > Ted
>> >
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-SIG-Finance at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> > -- Subscriber-posting only. If you want to post, subscribe first.
>> > -- Also note that this is not the r-help list where general R questions
> should
>> go.
>



More information about the R-SIG-Finance mailing list