[R-SIG-Finance] IBrokers - reqHistory results in missing random data

s algotr8der at gmail.com
Thu May 19 01:51:40 CEST 2011


On 5/18/11 4:25 PM, Jeffrey Ryan wrote:
> On Wed, May 18, 2011 at 2:33 PM, algotr8der <algotr8der at gmail.com> wrote:
>
>> Has anyone come across situations where the export of historical data from
>> IB's API has missing data points that occur randomly?
>>
>> I'm not too sure about 'randomly' but I have seen something similar in
> terms of missing dates/times/contracts on occasion.
>
>
>> I did the following to download data for the Select SPDR etfs:
>>
>>> tws <- twsConnect()
>>> contract <- twsEquity('XHB','SMART','ISLAND')
>>> reqHistory(tws, Contract=contract) -> XHB
>> By default this is set to retrieve 1 years worth of minute data. I received
>> all of the data for XLE but noticed that XHB has fewer data points. I
>> pulled
>> up the chart of XHB to examine whether those missing data points showed up
>> on the chart but all was well on the TWS Chart. As per IB, the backfilling
>> on the TWS Chart uses the same data export framework as that used by
>> reqHistoricalData. So I decided to re-download XHB. This time the missing
>> data points were different from the previous download.
>>
> reqHistory isn't much more than an lapply over/around the max download limit
> per call.  Maybe you could send me off-list the output of your request to
> see if I get the same issue.  Another thing to help debug is to run this on
> the IBGateway - and send me a copy of the log file.  setServerLogLevel(tws,
> 5) might do the same as well.
>
> I also would argue with IB that they aren't using the same framework for the
> backfills - since you can do more in the TWS than the API allows - something
> *is* different even at the user level.
Thanks for looking at this Jeff. Appreciate it.

I share your feelings in that something *is* different between the two
frameworks. I had a long discussion with one of IB's API representatives
today in regards to that but did not make much progress there. I have to
write up a ticket but I thought I would do further investigation first.

So I executed reHistory() using the IBGateway as you suggested. I will
upload the logs in a follow-up post as the api log is rather large and I
don't want to plug peoples inboxes.

This time I downloaded XHB the following dates (see below) had
incomplete data. Note the number below the date is a count of the number
of individual data points present for that day. The day post
Thanksgiving should be the only exception as it represents a half
trading day.

split.xts(XHB, f="days") -> testXHB
N <- length(testXHB)
for (i in 1:N) {
        print(index(testXHB[[i]])[1])
        print(dim(testXHB[[i]])[1])
}

[1] "2010-07-19 09:30:00 EDT"
[1] 389
[1] "2010-09-08 09:30:00 EDT"
[1] 389
[1] "2010-10-22 09:30:00 EDT"
[1] 389
[1] "2010-10-26 09:30:00 EDT"
[1] 389
[1] "2010-11-17 09:30:00 EST"
[1] 389
[1] "2010-11-26 09:30:00 EST"
[1] 210
[1] "2010-11-30 09:30:00 EST"
[1] 389
[1] "2010-12-30 09:30:00 EST"
[1] 389
[1] "2010-12-31 09:30:00 EST"
[1] 389
[1] "2011-02-14 09:30:00 EST"
[1] 389
[1] "2011-03-11 09:30:00 EST"
[1] 386
[1] "2011-04-14 09:30:00 EDT"
[1] 389
[1] "2011-04-25 09:30:00 EDT"
[1] 387


>> I used IB's TswDde Excel file to cross verify and I noticed that the data
>> is
>> present using the Excel API. It could be that the problem did not surface
>> because TwsDde limits the export of 1 minute data to 2 days worth of data.
>> I'm speculating here but I do know that downloading via reqHistory produces
>> data with missing data points that occur randomly.
>>
> The excel variant uses ActiveX - and I suspect it isn't really the same as
> the socket version (Java, IBrokers, etc).  Test using the distributed Java
> example program (or write one).  That would be more apples to apples.
Later today or sometime tomorrow I will test a Java example program to
compare to the ActiveX. Will provide further feedback.
>> The other thing I noticed was that the data pulled by reqHistory begins at
>> 9:30:00 and ends at 15:59:00 while the same using TswDde begins at 9:31:00
>> and ends at 16:00:00.
>>
>> 2010-06-18 15:58:00    15.85    15.85   15.84     15.84       1569  15.843
>> 0       337
>> 2010-06-18 15:59:00    15.84    15.85   15.81     15.81       3518  15.828
>> 0       527
>> 2010-06-21 09:30:00    16.04    16.10   16.03     16.09        240  16.047
>> 0        47
>> 2010-06-21 09:31:00    16.09    16.09   16.07     16.08        226  16.081
>> 0       119
>> 2010-05-17 09:31:00    18.00    18.03   18.00     18.02        115  18.020
>> 0        39
>>
> This is a potential indication of the differences internal to the socket vs.
> activeX.  From the log I get 20100611 14:59:00 as the last data stamp.  That
> is how bars get printed by the API as well - they use the time from the
> start of the minute, not the following one.  It is dumb - as this can then
> introduce a lookahead bias if you aren't aware/paying attention.  Or if you
> are merging with other data sources it causes havoc as well.  Point is,
> IBrokers isn't doing anything to the timestamp - it is coming from the
> TWS/IBG that way. You can set the output to be in POSIX seconds since the
> epoch, though I am not too sure what that would do in terms of stamps.  I'll
> check ...
>
> Best,
> Jeff
This is an issue and I really wonder why they are doing this. I need to
follow-up with IB on this.
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/IBrokers-reqHistory-results-in-missing-random-data-tp3533694p3533694.html
>> Sent from the Rmetrics mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>



More information about the R-SIG-Finance mailing list