[R] [R-SIG-Finance] RTAQ - convert function: warning causes incorrect loading of data
Nicolae Caprarescu
caprarn9 at cs.man.ac.uk
Mon Oct 15 22:30:50 CEST 2012
Hello,
Thanks for looking into it. What process do people usually follow to fix
bugs in RTAQ? I took me a while to realise what's wrong with it, therefore
it would be great if we can address it so others won't have to encounter
it.
Best,
Nicolae
On Sat, 13 Oct 2012 12:33:01 -0500, Jeff Ryan <jeff.a.ryan at gmail.com>
wrote:
> FWIW %m is the proper conversion for months. %M is minutes.
>
> Looks like a bug.
>
> Jeffrey Ryan | Founder | jeffrey.ryan at lemnica.com
>
> www.lemnica.com
>
> On Oct 13, 2012, at 10:33 AM, Nicolae Caprarescu <caprarn9 at cs.man.ac.uk>
> wrote:
>
>> Hi Michael,
>>
>> Thank you for pointing me in the right direction, I'm now using an
email
>> client rather than Nabble.
>>
>> Related to the issue I described below, it's resolved now, I have
managed
>> to fix it myself. However, I believe this might be a bug, or at least
>> something that needs improving; I have described both how to reproduce
>> this
>> issue and its solution in the below 4 steps:
>> 1) library(RTAQ)
>> 2) Create XXX_trades.csv file with the contents below using a relative
>> path like [somewhere]/TAQData/2010-11-01/XXX_trades.csv
>> SYMBOL,DATE,TIME,PRICE,SIZE,G127,CORR,COND,EX
>> XXX,20101101,10:30:00,11.49,500,0,0,@,B
>> XXX,20101101,10:30:02,11.49,322,0,0,0,B
>> XXX,20101101,10:30:02,11.49,178,0,0,@,B
>> XXX,20101101,10:30:03,11.49,500,0,0,@,B
>> XXX,20101101,10:30:03,11.49,187,0,0,@,D
>> 3)
>> #convert does not generate any errors/warnings, however it does not
work
>> properly
>> convert(from="2010-11-01",
>> to="2010-11-01",datasource="[somewhere]/TAQData/",
>>
datadestination="[somewhere]/TAQDataRData/",trades=T,quotes=F,ticker="XXX",dir=T,
>> extention="csv", header=T, tradecolnames=c("SYMBOL", "DATE", "TIME",
>> "PRICE", "SIZE", "G127", "CORR", "COND", "EX"))
>> #loading the RData created by convert
>>
TAQLoad("XXX",from="2010-11-01",to="2010-11-01",datasource="[somewhere]TAQDataRData/",
>> trades=T,quotes=F)
>> #output of TAQLoad
>> SYMBOL EX PRICE SIZE COND CORR G127
>> <NA> "XXX" "B" "11.49" "500" "@" "0" "0"
>> <NA> "XXX" "B" "11.49" "322" "0" "0" "0"
>> <NA> "XXX" "B" "11.49" "178" "@" "0" "0"
>> <NA> "XXX" "B" "11.49" "500" "@" "0" "0"
>> <NA> "XXX" "D" "11.49" "187" "@" "0" "0"
>> Warning message:
>> timezone of object (GMT) is different than current timezone ().
>>
>> Problem are the <NA>s. If one does not supply the format of date and
time
>> to the convert function, it is assumed that the standard NYSE format is
>> used, and therefore RTAQ internally (convert_to_RData.r line 32)
>> represents
>> this as "Y%M%D %H:%M:%S". Whilst this works fine for some things, when
a
>> timeDate is initialised using this format (convert_to_RData.r line
102),
>> it
>> does not work. timeDate expects a correct format like "%Y%m%d %H:%M:%S"
>> format rather than "Y%M%D %H:%M:%S".
>> Run the below two to confirm:
>> tdobject=timeDate:::timeDate(paste(as.vector("2010-10-11"),
>> as.vector("10:30:30")), format="%Y%M%D
>> %H:%M:%S",FinCenter="GMT",zone="GMT")
>> #tdobject is GMT [1] [NA]
>> tdobject=timeDate:::timeDate(paste(as.vector("20101011"),
>> as.vector("10:30:30")), format="%Y%m%d
>> %H:%M:%S",FinCenter="GMT",zone="GMT")
>> #tdobject is now GMT [1] [2010-10-11 10:30:30]
>>
>> Therefore, if one explicitly includes format="%Y%m%d %H:%M:%S" in the
>> convert function, everything works fine and the <NA> problem above is
>> solved; this is my solution. Can I please suggest that, once you
>> investigate this and provided that you confirm my understanding,
>> convert_to_RData.r is changed in order to use "%Y%m%d %H:%M:%S" as the
>> default format?
>>
>> 4) My environment:
>> R version 2.15.1 (2012-06-22)
>> Platform: i686-pc-linux-gnu (32-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB
>> [4] LC_COLLATE=C LC_MONETARY=en_GB LC_MESSAGES=en_GB
>> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
>> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] RTAQ_0.2 timeDate_2160.97 xts_0.8-6 zoo_1.7-8
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.15.1 lattice_0.20-6
>>
>>
>> Best wishes,
>> Nicolae
>>
>>
>>
>> On Fri, 12 Oct 2012 21:52:22 +0100, "R. Michael Weylandt"
>> <michael.weylandt at gmail.com> wrote:
>>> I'm forwarding this to the R-SIG-Finance list, where ou'll have a more
>>> specialized audience.
>>>
>>> In the meanwhile, you may wish to look at
>>>
>>
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>>>
>>> Finally, I note you're posting from Nabble. Please do include context
in
>>> your reply -- I don't believe Nabble does this automatically, so
>>> you'll need to manually include it. Most of the regular respondents on
>>> these lists don't use Nabble -- it is a _mailing list_ after all -- so
>>> we don't get the forum view you do, only emails of the individual
>>> posts. Combine that with the high volume of posts, and it's quite
>>> difficult to trace a discussion if we all don't make sure to include
>>> context.
>>>
>>> Cheers,
>>> Michael
>>>
>>> On Fri, Oct 12, 2012 at 7:01 PM, caprarn9 <caprarn9 at cs.man.ac.uk>
wrote:
>>>> Hello,
>>>>
>>>> I am closely following the RTAQ documentation in order to load my
>> dataset
>>>> into R, however I get this warning when running the convert function
in
>>>> the
>>>> following way:
>>>>
>>>> convert(from="2010-11-01", to="2010-11-01",datasource=datasource,
>>>> datadestination=datadestination,trades=T,quotes=T,ticker="BAC",dir=T,
>>>> extention="csv", header=T, tradecolnames=c("SYMBOL", "DATE", "TIME",
>>>> "PRICE", "SIZE", "G127", "CORR", "COND", "EX"),
>> quotecolnames=c("SYMBOL",
>>>> "DATE", "TIME", "BID", "OFR", "BIDSIZ", "OFRSIZ", "MODE", "EX"))
>>>>
>>>> The only warning returned is:
>>>> In `[<-.factor`(`*tmp*`, is.na(tdata$G127), value = c(1L, 1L, 1L, :
>>>> invalid factor level, NAs generated
>>>>
>>>> As it is a warning, the .RData files still get created and I can use
>>>> TAQLoad
>>>> to load them:
>>>>
>>>> x <-
>>>>
>>
TAQLoad("BAC",from="2010-11-01",to="2010-11-01",datasource=datadestination,
>>>> trades=T,quotes=T)
>>>>
>>>> The PROBLEM:
>>>> head(x)
>>>> SYMBOL EX PRICE SIZE COND CORR G127
>>>> <NA> "BAC" "B" "11.4900" " 500" "@" "0" "0"
>>>> ...
>>>>
>>>> This is the same for the quotes objects, but different headers
>>>> obviously. I
>>>> get a <NA> instead of the expected YYY-MM-DD HH:MM:SS format for each
>>>> observation.
>>>>
>>>> I've spent a fair number of hours on trying to get this right, no
>>>> success.
>>>> Can you please provide me with some guidance?
>>>>
>>>> Thank you.
>>>>
>>>> A sample from the CSV files I use:
>>>>
>>>> SYMBOL,DATE,TIME,BID,OFR,BIDSIZ,OFRSIZ,MODE,EX
>>>> BAC,20101101,9:30:00,11.5,11.51,5,116,12,P
>>>> ...
>>>>
>>>> SYMBOL,DATE,TIME,PRICE,SIZE,G127,CORR,COND,EX
>>>> BAC,20101101,10:30:00,11.49,500,0,0,@,B
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>
http://r.789695.n4.nabble.com/RTAQ-convert-function-warning-causes-incorrect-loading-of-data-tp4646025.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
More information about the R-help
mailing list