[R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?
Kent Hoxsey
khoxsey at gmail.com
Wed Nov 21 21:28:40 CET 2012
Ah, now it would appear you are asking simple R questions (rather than complicated finance questions) so I am going to point you toward some good sources. You will definitely want to do a little spelunking in the help, and there's a ton of good stuff (including examples) for working with time series on Stack Overflow (esp search for topics tagged with "r" and "xts").
Display of the number of digits is controlled by the options() function, read about it with help(options) and set it with something like: options(digits=10). You will want to read up on POSIXct(), particularly the "Sub-second Accuracy" section.
Your SQL is most likely coming back into R as a dataframe (help(data.frame)) which you can look at using head() or tail() or str(). In combination with the digits settings, you will likely be able to see all of the digits you seek.
On Nov 21, 2012, at 12:22 PM, Michael Lyszczek wrote:
> I am now doing this :
>
> select cast(extract(epoch from timestamp) as float) from trades order by timestamp;
>
> this gives me a list that looks like .....
> 1353381185.089
> 1353381185.089
> 1353381189.96
> 1353381189.96
> 1353381189.96
> 1353381190.353
> 1353381190.353
> 1353381190.353
> 1353381191.139
> 1353381191.139
> 1353381191.139
> 1353381256.04
> 1353381855.805
> 1353381864.52
> 1353384140.324
>
> Etc....
>
> I try to bring it into R the same way, so new query:
>
> NGZ2zoo <- dbGetQuery(con, "select cast(extract(epoch from timestamp) as float), instrument_id, trade_size, trade_price, exchange_generated_price, trade_id, agressor_side_isbuy from trades where trades.instrument_id = (select CAST(max(securities.\"UnderlyingSymbol\") as INT) from securities where securities.\"FNYShortName\" = 'iNGZ2')");
> Summary(NGZ2zoo)
> date_part instrument_id trade_size trade_price
> Min. :1.35e+09 Min. :584445 Min. : 2500 Min. :3708
> 1st Qu.:1.35e+09 1st Qu.:584445 1st Qu.: 2500 1st Qu.:3742
> Median :1.35e+09 Median :584445 Median : 2500 Median :3760
> Mean :1.35e+09 Mean :584445 Mean : 3070 Mean :3759
> 3rd Qu.:1.35e+09 3rd Qu.:584445 3rd Qu.: 2500 3rd Qu.:3776
> Max. :1.35e+09 Max. :584445 Max. :70000 Max. :3833
> exchange_generated_price trade_id agressor_side_isbuy
> Mode :logical Min. :65001155 Min. :-1.000
> FALSE:2761 1st Qu.:65767044 1st Qu.:-1.000
> TRUE :474 Median :66313346 Median : 0.000
> NA's :0 Mean :66306294 Mean :-0.068
> 3rd Qu.:66880936 3rd Qu.: 1.000
> Max. :67377417 Max. : 1.000
>
> as.POSIXct(NGZ2zoo$date_part, origin="1970-01-01")
> yields:
> [3199] "2012-11-20 02:12:29 EST" "2012-11-20 00:29:30 EST"
> [3201] "2012-11-20 02:15:27 EST" "2012-11-20 00:29:31 EST"
> [3203] "2012-11-20 00:29:31 EST" "2012-11-20 00:29:33 EST"
> [3205] "2012-11-20 02:15:28 EST" "2012-11-20 02:15:33 EST"
> [3207] "2012-11-20 02:15:33 EST" "2012-11-20 02:16:11 EST"
> [3209] "2012-11-20 02:18:21 EST" "2012-11-20 02:18:58 EST"
> [3211] "2012-11-20 00:29:33 EST" "2012-11-20 00:29:34 EST"
> [3213] "2012-11-20 00:29:40 EST" "2012-11-20 02:22:04 EST"
> [3215] "2012-11-20 00:29:43 EST" "2012-11-20 02:22:31 EST"
> [3217] "2012-11-20 02:22:33 EST" "2012-11-20 02:26:23 EST"
> [3219] "2012-11-20 02:26:23 EST" "2012-11-20 02:27:53 EST"
> [3221] "2012-11-20 02:29:30 EST" "2012-11-20 02:29:36 EST"
> [3223] "2012-11-20 02:29:39 EST" "2012-11-20 02:47:05 EST"
> [3225] "2012-11-20 02:47:05 EST" "2012-11-20 02:47:05 EST"
> [3227] "2012-11-20 02:51:25 EST" "2012-11-20 02:53:37 EST"
> [3229] "2012-11-20 02:53:44 EST" "2012-11-20 02:55:53 EST"
> [3231] "2012-11-20 02:58:20 EST" "2012-11-20 03:14:16 EST"
> [3233] "2012-11-20 03:24:15 EST" "2012-11-20 03:24:24 EST"
> [3235] "2012-11-20 04:02:20 EST"
>
>
> Even if I just do the system time:
>> as.POSIXct(Sys.time(), format = "%Y-%m-%d %H:%M:%OS")
> [1] "2012-11-21 15:07:59 EST"
>
>
> Is it just stored as a float? Or numeric?
>
> Any way you can see the actual value, not scientific notation?
>
> Thanks so much,
>
> Mike
>
> -----Original Message-----
> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Kent Hoxsey
> Sent: Wednesday, November 21, 2012 3:03 PM
> To: r-sig-finance at r-project.org
> Subject: Re: [R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?
>
> After working through a similar set of issues a while back, I discovered this very useful web page:
>
> http://www.epochconverter.com/
>
> Short summary: use your RDBMS to do the conversion to epoch time, and hand that number to R. Do not write SQL to write text to parse in R.
>
> In my experience, the task of converting from a string representation to POSIXct representation in R can be quite, erm, tedious when you're working with a lot of data. In contrast, doing the conversion in a source database (Oracle for me) is really fast, as are the downstream operations in R.
>
> Caveat: I have not attempted to preserve or work with the millis, so you might need to do some prototyping to get your conversion right coming out Postgres, but that web site includes quite a bit of discussion about the topic.
>
> HTH.
> On Nov 21, 2012, at 10:16 AM, Michael Lyszczek wrote:
>
>> Hi,
>>
>> I am new to R and have been really having a great time dealing with millisecond timestamps in financial data.
>>
>> I can get them to work in R using read.csv() then then converting the time stamp column using as.POSIXct(DATA$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
>>
>> However, when I use DBI and PgSql to query a table for ticks / bbo I have no idea how to specify not to drop off the trailing millisecond.
>> <code>
>>
>> options(digs.secs = 3);
>>
>> library("DBI")
>> library("RPostgreSQL")
>>
>> drv <- dbDriver("PostgreSQL")
>> con <- dbConnect(drv, dbname="market_data", host="host", user="user", password="password")
>> NGZ2 <- dbGetQuery(con, "select * from trades blah blah blah");
>>
>> </code>
>>
>> Summary(NGZ2) yields:
>>> summary(NGZ2zoo)
>> instrument_id table_trade_id trade_size trade_price
>> Min. :584445 Min. :2786130 Min. : 2500 Min. :3708
>> 1st Qu.:584445 1st Qu.:2790335 1st Qu.: 2500 1st Qu.:3742
>> Median :584445 Median :2795493 Median : 2500 Median :3760
>> Mean :584445 Mean :2795617 Mean : 3070 Mean :3759
>> 3rd Qu.:584445 3rd Qu.:2800727 3rd Qu.: 2500 3rd Qu.:3776
>> Max. :584445 Max. :2806760 Max. :70000 Max. :3833
>> exchange_generated_price trade_id timestamp
>> Mode :logical Min. :65001155 Min. :2012-11-19 00:49:07
>> FALSE:2761 1st Qu.:65767044 1st Qu.:2012-11-19 14:25:28
>> TRUE :474 Median :66313346 Median :2012-11-19 16:25:25
>> NA's :0 Mean :66306294 Mean :2012-11-19 16:23:35
>> 3rd Qu.:66880936 3rd Qu.:2012-11-19 18:50:35
>> Max. :67377417 Max. :2012-11-19 23:02:20
>> agressor_side_isbuy
>> Min. :-1.00000
>> 1st Qu.:-1.00000
>> Median : 0.00000
>> Mean :-0.06801
>> 3rd Qu.: 1.00000
>> Max. : 1.00000
>>
>>
>> Meanwhile using a read.csv() for a similar dataset yields:
>>
>>
>> NGX2 <- read.csv("iNGCal1310-12-2012.csv", header=TRUE)
>>
>>
>>
>>> summary(NGX2)
>>
>> TIMESTAMP BID_SIZE BID
>>
>> 2012-10-22 16:04:19.561: 12 Min. : 2500 Min. :3.795
>>
>> 2012-10-22 13:21:33.533: 11 1st Qu.: 2500 1st Qu.:3.911
>>
>> 2012-10-22 13:54:06.927: 11 Median : 2500 Median :3.927
>>
>> 2012-10-22 14:44:43.331: 11 Mean : 4356 Mean :3.929
>>
>> 2012-10-22 15:50:03.755: 11 3rd Qu.: 5000 3rd Qu.:3.943
>>
>> 2012-10-22 16:09:08.757: 11 Max. :15000 Max. :3.989
>>
>> (Other) :119621
>>
>> ASK ASK_SIZE
>>
>> Min. :3.904 Min. : 2500
>>
>> 1st Qu.:3.919 1st Qu.: 2500
>>
>> Median :3.935 Median : 5000
>>
>> Mean :3.938 Mean : 4667
>>
>> 3rd Qu.:3.950 3rd Qu.: 7500
>>
>> Max. :4.007 Max. :15000
>>
>>
>>
>> After processing with as.PosixCT:
>>
>>
>>
>>> NGX2$TS <-as.POSIXct(NGX2$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
>>
>>> summary(NGX2)
>>
>> TIMESTAMP BID_SIZE BID
>>
>> 2012-10-22 16:04:19.561: 12 Min. : 2500 Min. :3.795
>>
>> 2012-10-22 13:21:33.533: 11 1st Qu.: 2500 1st Qu.:3.911
>>
>> 2012-10-22 13:54:06.927: 11 Median : 2500 Median :3.927
>>
>> 2012-10-22 14:44:43.331: 11 Mean : 4356 Mean :3.929
>>
>> 2012-10-22 15:50:03.755: 11 3rd Qu.: 5000 3rd Qu.:3.943
>>
>> 2012-10-22 16:09:08.757: 11 Max. :15000 Max. :3.989
>>
>> (Other) :119621
>>
>> ASK ASK_SIZE TS
>>
>> Min. :3.904 Min. : 2500 Min. :2012-10-22 00:00:00
>>
>> 1st Qu.:3.919 1st Qu.: 2500 1st Qu.:2012-10-22 13:59:36
>>
>> Median :3.935 Median : 5000 Median :2012-10-22 14:56:57
>>
>> Mean :3.938 Mean : 4667 Mean :2012-10-22 14:52:21
>>
>> 3rd Qu.:3.950 3rd Qu.: 7500 3rd Qu.:2012-10-22 16:04:16
>>
>> Max. :4.007 Max. :15000 Max. :2012-10-22 16:52:08
>>
>>
>>
>>
>> Is there any way to specify date format for a dbGetQuery () ?
>>
>>
>> Straight from SQL:
>> instrument_id | timestamp | best_bid_size | best_bid | best_offer | best_offer_size | data_id
>> ---------------+----------------------------+---------------+----------+------------+-----------------+----------
>> 592802 | 2012-11-19 13:39:52.855-05 | 5000 | 3.864 | 3.872 | 7500 | 17632624
>> 1461673 | 2012-11-19 13:39:52.855-05 | 5000 | -0.26 | -0.237 | 2500 | 17632625
>> 592790 | 2012-11-19 13:39:52.855-05 | 5000 | 3.951 | 3.96 | 2500 | 17632626
>> 595724 | 2012-11-19 13:39:52.855-05 | 7500 | -0.404 | -0.37 | 2500 | 17632627
>> 592792 | 2012-11-19 13:39:52.855-05 | 7500 | 3.897 | 3.9 | 10000 | 17632628
>> 592796 | 2012-11-19 13:39:52.855-05 | 2500 | 3.865 | 3.868 | 10000 | 17632629
>> 592792 | 2012-11-19 13:39:52.855-05 | 10000 | 3.897 | 3.9 | 10000 | 17632630
>> 592796 | 2012-11-19 13:39:52.855-05 | 5000 | 3.865 | 3.868 | 10000 | 17632631
>> 592787 | 2012-11-19 13:39:52.856-05 | 10000 | 3.837 | 3.842 | 10000 | 17632632
>> 592792 | 2012-11-19 13:39:52.856-05 | 2500 | 3.898 | 3.9 | 10000 | 17632633
>> 592793 | 2012-11-19 13:39:52.856-05 | 10000 | 3.887 | 3.891 | 10000 | 17632634
>> 592796 | 2012-11-19 13:39:52.856-05 | 7500 | 3.865 | 3.868 | 10000 | 17632635
>> 592787 | 2012-11-19 13:39:52.856-05 | 12500 | 3.837 | 3.842 | 10000 | 17632636
>> 592792 | 2012-11-19 13:39:52.856-05 | 5000 | 3.898 | 3.9 | 10000 | 17632637
>> 592793 | 2012-11-19 13:39:52.856-05 | 12500 | 3.887 | 3.891 | 10000 | 17632638
>> 592796 | 2012-11-19 13:39:52.856-05 | 10000 | 3.865 | 3.868 | 10000 | 17632639
>> 592787 | 2012-11-19 13:39:52.856-05 | 2500 | 3.838 | 3.842 | 10000 | 17632640
>>
>>
>> **********************************************************************
>> This e-mail is confidential and for the individual or entity named above and may contain privileged information. If you are not the intended recipient, please notify us immediately by return e-mail or telephone and destroy this message. This communication is for informational purp...{{dropped:22}}
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions should go.
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
More information about the R-SIG-Finance
mailing list