[R-SIG-Finance] [R] RTAQ - convert function: warning causes incorrect loading of data

Jeff Ryan jeff.a.ryan at gmail.com
Sat Oct 13 19:33:01 CEST 2012


FWIW %m is the proper conversion for months. %M is minutes. 

Looks like a bug. 

Jeffrey Ryan    |    Founder    |    jeffrey.ryan at lemnica.com

www.lemnica.com

On Oct 13, 2012, at 10:33 AM, Nicolae Caprarescu <caprarn9 at cs.man.ac.uk> wrote:

> Hi Michael,
> 
> Thank you for pointing me in the right direction, I'm now using an email
> client rather than Nabble.
> 
> Related to the issue I described below, it's resolved now, I have managed
> to fix it myself. However, I believe this might be a bug, or at least
> something that needs improving; I have described both how to reproduce this
> issue and its solution in the below 4 steps:
> 1) library(RTAQ)
> 2) Create XXX_trades.csv file with the contents below using a relative
> path like [somewhere]/TAQData/2010-11-01/XXX_trades.csv
> SYMBOL,DATE,TIME,PRICE,SIZE,G127,CORR,COND,EX
> XXX,20101101,10:30:00,11.49,500,0,0,@,B
> XXX,20101101,10:30:02,11.49,322,0,0,0,B
> XXX,20101101,10:30:02,11.49,178,0,0,@,B
> XXX,20101101,10:30:03,11.49,500,0,0,@,B
> XXX,20101101,10:30:03,11.49,187,0,0,@,D
> 3) 
> #convert does not generate any errors/warnings, however it does not work
> properly
> convert(from="2010-11-01",
> to="2010-11-01",datasource="[somewhere]/TAQData/",
> datadestination="[somewhere]/TAQDataRData/",trades=T,quotes=F,ticker="XXX",dir=T,
> extention="csv", header=T, tradecolnames=c("SYMBOL", "DATE", "TIME",
> "PRICE", "SIZE", "G127", "CORR", "COND", "EX"))
> #loading the RData created by convert
> TAQLoad("XXX",from="2010-11-01",to="2010-11-01",datasource="[somewhere]TAQDataRData/",
> trades=T,quotes=F)
> #output of TAQLoad 
>     SYMBOL EX  PRICE   SIZE  COND CORR G127
> <NA> "XXX"  "B" "11.49" "500" "@"  "0"  "0" 
> <NA> "XXX"  "B" "11.49" "322" "0"  "0"  "0" 
> <NA> "XXX"  "B" "11.49" "178" "@"  "0"  "0" 
> <NA> "XXX"  "B" "11.49" "500" "@"  "0"  "0" 
> <NA> "XXX"  "D" "11.49" "187" "@"  "0"  "0" 
> Warning message:
> timezone of object (GMT) is different than current timezone (). 
> 
> Problem are the <NA>s. If one does not supply the format of date and time
> to the convert function, it is assumed that the standard NYSE format is
> used, and therefore RTAQ internally (convert_to_RData.r line 32) represents
> this as "Y%M%D %H:%M:%S". Whilst this works fine for some things, when a
> timeDate is initialised using this format (convert_to_RData.r line 102), it
> does not work. timeDate expects a correct format like "%Y%m%d %H:%M:%S"
> format rather than "Y%M%D %H:%M:%S".
> Run the below two to confirm: 
> tdobject=timeDate:::timeDate(paste(as.vector("2010-10-11"),
> as.vector("10:30:30")), format="%Y%M%D
> %H:%M:%S",FinCenter="GMT",zone="GMT")
> #tdobject is GMT [1] [NA]
> tdobject=timeDate:::timeDate(paste(as.vector("20101011"),
> as.vector("10:30:30")), format="%Y%m%d
> %H:%M:%S",FinCenter="GMT",zone="GMT")
> #tdobject is now GMT [1] [2010-10-11 10:30:30]
> 
> Therefore, if one explicitly includes format="%Y%m%d %H:%M:%S" in the
> convert function, everything works fine and the <NA> problem above is
> solved; this is my solution. Can I please suggest that, once you
> investigate this and provided that you confirm my understanding,
> convert_to_RData.r is changed in order to use "%Y%m%d %H:%M:%S" as the
> default format? 
> 
> 4) My environment:
> R version 2.15.1 (2012-06-22)
> Platform: i686-pc-linux-gnu (32-bit)
> 
> locale:
> [1] LC_CTYPE=en_GB       LC_NUMERIC=C         LC_TIME=en_GB       
> [4] LC_COLLATE=C         LC_MONETARY=en_GB    LC_MESSAGES=en_GB   
> [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C 
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] RTAQ_0.2         timeDate_2160.97 xts_0.8-6        zoo_1.7-8       
> 
> loaded via a namespace (and not attached):
> [1] grid_2.15.1    lattice_0.20-6
> 
> 
> Best wishes,
> Nicolae
> 
> 
> 
> On Fri, 12 Oct 2012 21:52:22 +0100, "R. Michael Weylandt"
> <michael.weylandt at gmail.com> wrote:
>> I'm forwarding this to the R-SIG-Finance list, where ou'll have a more
>> specialized audience.
>> 
>> In the meanwhile, you may wish to look at
>> 
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> 
>> Finally, I note you're posting from Nabble. Please do include context in
>> your reply -- I don't believe Nabble does this automatically, so
>> you'll need to manually include it. Most of the regular respondents on
>> these lists don't use Nabble -- it is a _mailing list_ after all -- so
>> we don't get the forum view you do, only emails of the individual
>> posts. Combine that with the high volume of posts, and it's quite
>> difficult to trace a discussion if we all don't make sure to include
>> context.
>> 
>> Cheers,
>> Michael
>> 
>> On Fri, Oct 12, 2012 at 7:01 PM, caprarn9 <caprarn9 at cs.man.ac.uk> wrote:
>>> Hello,
>>> 
>>> I am closely following the RTAQ documentation in order to load my
> dataset
>>> into R, however I get this warning when running the convert function in
>>> the
>>> following way:
>>> 
>>> convert(from="2010-11-01", to="2010-11-01",datasource=datasource,
>>> datadestination=datadestination,trades=T,quotes=T,ticker="BAC",dir=T,
>>> extention="csv", header=T, tradecolnames=c("SYMBOL", "DATE", "TIME",
>>> "PRICE", "SIZE", "G127", "CORR", "COND", "EX"),
> quotecolnames=c("SYMBOL",
>>> "DATE", "TIME", "BID", "OFR", "BIDSIZ", "OFRSIZ", "MODE", "EX"))
>>> 
>>> The only warning returned is:
>>> In `[<-.factor`(`*tmp*`, is.na(tdata$G127), value = c(1L, 1L, 1L,  :
>>>  invalid factor level, NAs generated
>>> 
>>> As it is a warning, the .RData files still get created and I can use
>>> TAQLoad
>>> to load them:
>>> 
>>> x <-
>>> 
> TAQLoad("BAC",from="2010-11-01",to="2010-11-01",datasource=datadestination,
>>> trades=T,quotes=T)
>>> 
>>> The PROBLEM:
>>> head(x)
>>>     SYMBOL EX  PRICE     SIZE    COND CORR G127
>>> <NA> "BAC"  "B" "11.4900" "  500" "@"  "0"  "0"
>>> ...
>>> 
>>> This is the same for the quotes objects, but different headers
>>> obviously. I
>>> get a <NA> instead of the expected YYY-MM-DD HH:MM:SS format for each
>>> observation.
>>> 
>>> I've spent a fair number of hours on trying to get this right, no
>>> success.
>>> Can you please provide me with some guidance?
>>> 
>>> Thank you.
>>> 
>>> A sample from the CSV files I use:
>>> 
>>> SYMBOL,DATE,TIME,BID,OFR,BIDSIZ,OFRSIZ,MODE,EX
>>> BAC,20101101,9:30:00,11.5,11.51,5,116,12,P
>>> ...
>>> 
>>> SYMBOL,DATE,TIME,PRICE,SIZE,G127,CORR,COND,EX
>>> BAC,20101101,10:30:00,11.49,500,0,0,@,B
>>> ...
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> 
> http://r.789695.n4.nabble.com/RTAQ-convert-function-warning-causes-incorrect-loading-of-data-tp4646025.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



More information about the R-SIG-Finance mailing list