[R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?
Michael Lyszczek
m.lyszczek at firstny.com
Wed Nov 21 21:22:47 CET 2012
Kent,
Thanks for the response.
I am now doing this :
select cast(extract(epoch from timestamp) as float) from trades order by timestamp;
this gives me a list that looks like .....
1353381185.089
1353381185.089
1353381189.96
1353381189.96
1353381189.96
1353381190.353
1353381190.353
1353381190.353
1353381191.139
1353381191.139
1353381191.139
1353381256.04
1353381855.805
1353381864.52
1353384140.324
Etc....
I try to bring it into R the same way, so new query:
NGZ2zoo <- dbGetQuery(con, "select cast(extract(epoch from timestamp) as float), instrument_id, trade_size, trade_price, exchange_generated_price, trade_id, agressor_side_isbuy from trades where trades.instrument_id = (select CAST(max(securities.\"UnderlyingSymbol\") as INT) from securities where securities.\"FNYShortName\" = 'iNGZ2')");
Summary(NGZ2zoo)
date_part instrument_id trade_size trade_price
Min. :1.35e+09 Min. :584445 Min. : 2500 Min. :3708
1st Qu.:1.35e+09 1st Qu.:584445 1st Qu.: 2500 1st Qu.:3742
Median :1.35e+09 Median :584445 Median : 2500 Median :3760
Mean :1.35e+09 Mean :584445 Mean : 3070 Mean :3759
3rd Qu.:1.35e+09 3rd Qu.:584445 3rd Qu.: 2500 3rd Qu.:3776
Max. :1.35e+09 Max. :584445 Max. :70000 Max. :3833
exchange_generated_price trade_id agressor_side_isbuy
Mode :logical Min. :65001155 Min. :-1.000
FALSE:2761 1st Qu.:65767044 1st Qu.:-1.000
TRUE :474 Median :66313346 Median : 0.000
NA's :0 Mean :66306294 Mean :-0.068
3rd Qu.:66880936 3rd Qu.: 1.000
Max. :67377417 Max. : 1.000
as.POSIXct(NGZ2zoo$date_part, origin="1970-01-01")
yields:
[3199] "2012-11-20 02:12:29 EST" "2012-11-20 00:29:30 EST"
[3201] "2012-11-20 02:15:27 EST" "2012-11-20 00:29:31 EST"
[3203] "2012-11-20 00:29:31 EST" "2012-11-20 00:29:33 EST"
[3205] "2012-11-20 02:15:28 EST" "2012-11-20 02:15:33 EST"
[3207] "2012-11-20 02:15:33 EST" "2012-11-20 02:16:11 EST"
[3209] "2012-11-20 02:18:21 EST" "2012-11-20 02:18:58 EST"
[3211] "2012-11-20 00:29:33 EST" "2012-11-20 00:29:34 EST"
[3213] "2012-11-20 00:29:40 EST" "2012-11-20 02:22:04 EST"
[3215] "2012-11-20 00:29:43 EST" "2012-11-20 02:22:31 EST"
[3217] "2012-11-20 02:22:33 EST" "2012-11-20 02:26:23 EST"
[3219] "2012-11-20 02:26:23 EST" "2012-11-20 02:27:53 EST"
[3221] "2012-11-20 02:29:30 EST" "2012-11-20 02:29:36 EST"
[3223] "2012-11-20 02:29:39 EST" "2012-11-20 02:47:05 EST"
[3225] "2012-11-20 02:47:05 EST" "2012-11-20 02:47:05 EST"
[3227] "2012-11-20 02:51:25 EST" "2012-11-20 02:53:37 EST"
[3229] "2012-11-20 02:53:44 EST" "2012-11-20 02:55:53 EST"
[3231] "2012-11-20 02:58:20 EST" "2012-11-20 03:14:16 EST"
[3233] "2012-11-20 03:24:15 EST" "2012-11-20 03:24:24 EST"
[3235] "2012-11-20 04:02:20 EST"
Even if I just do the system time:
> as.POSIXct(Sys.time(), format = "%Y-%m-%d %H:%M:%OS")
[1] "2012-11-21 15:07:59 EST"
Is it just stored as a float? Or numeric?
Any way you can see the actual value, not scientific notation?
Thanks so much,
Mike
-----Original Message-----
From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Kent Hoxsey
Sent: Wednesday, November 21, 2012 3:03 PM
To: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?
After working through a similar set of issues a while back, I discovered this very useful web page:
http://www.epochconverter.com/
Short summary: use your RDBMS to do the conversion to epoch time, and hand that number to R. Do not write SQL to write text to parse in R.
In my experience, the task of converting from a string representation to POSIXct representation in R can be quite, erm, tedious when you're working with a lot of data. In contrast, doing the conversion in a source database (Oracle for me) is really fast, as are the downstream operations in R.
Caveat: I have not attempted to preserve or work with the millis, so you might need to do some prototyping to get your conversion right coming out Postgres, but that web site includes quite a bit of discussion about the topic.
HTH.
On Nov 21, 2012, at 10:16 AM, Michael Lyszczek wrote:
> Hi,
>
> I am new to R and have been really having a great time dealing with millisecond timestamps in financial data.
>
> I can get them to work in R using read.csv() then then converting the time stamp column using as.POSIXct(DATA$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
>
> However, when I use DBI and PgSql to query a table for ticks / bbo I have no idea how to specify not to drop off the trailing millisecond.
> <code>
>
> options(digs.secs = 3);
>
> library("DBI")
> library("RPostgreSQL")
>
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, dbname="market_data", host="host", user="user", password="password")
> NGZ2 <- dbGetQuery(con, "select * from trades blah blah blah");
>
> </code>
>
> Summary(NGZ2) yields:
>> summary(NGZ2zoo)
> instrument_id table_trade_id trade_size trade_price
> Min. :584445 Min. :2786130 Min. : 2500 Min. :3708
> 1st Qu.:584445 1st Qu.:2790335 1st Qu.: 2500 1st Qu.:3742
> Median :584445 Median :2795493 Median : 2500 Median :3760
> Mean :584445 Mean :2795617 Mean : 3070 Mean :3759
> 3rd Qu.:584445 3rd Qu.:2800727 3rd Qu.: 2500 3rd Qu.:3776
> Max. :584445 Max. :2806760 Max. :70000 Max. :3833
> exchange_generated_price trade_id timestamp
> Mode :logical Min. :65001155 Min. :2012-11-19 00:49:07
> FALSE:2761 1st Qu.:65767044 1st Qu.:2012-11-19 14:25:28
> TRUE :474 Median :66313346 Median :2012-11-19 16:25:25
> NA's :0 Mean :66306294 Mean :2012-11-19 16:23:35
> 3rd Qu.:66880936 3rd Qu.:2012-11-19 18:50:35
> Max. :67377417 Max. :2012-11-19 23:02:20
> agressor_side_isbuy
> Min. :-1.00000
> 1st Qu.:-1.00000
> Median : 0.00000
> Mean :-0.06801
> 3rd Qu.: 1.00000
> Max. : 1.00000
>
>
> Meanwhile using a read.csv() for a similar dataset yields:
>
>
> NGX2 <- read.csv("iNGCal1310-12-2012.csv", header=TRUE)
>
>
>
>> summary(NGX2)
>
> TIMESTAMP BID_SIZE BID
>
> 2012-10-22 16:04:19.561: 12 Min. : 2500 Min. :3.795
>
> 2012-10-22 13:21:33.533: 11 1st Qu.: 2500 1st Qu.:3.911
>
> 2012-10-22 13:54:06.927: 11 Median : 2500 Median :3.927
>
> 2012-10-22 14:44:43.331: 11 Mean : 4356 Mean :3.929
>
> 2012-10-22 15:50:03.755: 11 3rd Qu.: 5000 3rd Qu.:3.943
>
> 2012-10-22 16:09:08.757: 11 Max. :15000 Max. :3.989
>
> (Other) :119621
>
> ASK ASK_SIZE
>
> Min. :3.904 Min. : 2500
>
> 1st Qu.:3.919 1st Qu.: 2500
>
> Median :3.935 Median : 5000
>
> Mean :3.938 Mean : 4667
>
> 3rd Qu.:3.950 3rd Qu.: 7500
>
> Max. :4.007 Max. :15000
>
>
>
> After processing with as.PosixCT:
>
>
>
>> NGX2$TS <-as.POSIXct(NGX2$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
>
>> summary(NGX2)
>
> TIMESTAMP BID_SIZE BID
>
> 2012-10-22 16:04:19.561: 12 Min. : 2500 Min. :3.795
>
> 2012-10-22 13:21:33.533: 11 1st Qu.: 2500 1st Qu.:3.911
>
> 2012-10-22 13:54:06.927: 11 Median : 2500 Median :3.927
>
> 2012-10-22 14:44:43.331: 11 Mean : 4356 Mean :3.929
>
> 2012-10-22 15:50:03.755: 11 3rd Qu.: 5000 3rd Qu.:3.943
>
> 2012-10-22 16:09:08.757: 11 Max. :15000 Max. :3.989
>
> (Other) :119621
>
> ASK ASK_SIZE TS
>
> Min. :3.904 Min. : 2500 Min. :2012-10-22 00:00:00
>
> 1st Qu.:3.919 1st Qu.: 2500 1st Qu.:2012-10-22 13:59:36
>
> Median :3.935 Median : 5000 Median :2012-10-22 14:56:57
>
> Mean :3.938 Mean : 4667 Mean :2012-10-22 14:52:21
>
> 3rd Qu.:3.950 3rd Qu.: 7500 3rd Qu.:2012-10-22 16:04:16
>
> Max. :4.007 Max. :15000 Max. :2012-10-22 16:52:08
>
>
>
>
> Is there any way to specify date format for a dbGetQuery () ?
>
>
> Straight from SQL:
> instrument_id | timestamp | best_bid_size | best_bid | best_offer | best_offer_size | data_id
> ---------------+----------------------------+---------------+----------+------------+-----------------+----------
> 592802 | 2012-11-19 13:39:52.855-05 | 5000 | 3.864 | 3.872 | 7500 | 17632624
> 1461673 | 2012-11-19 13:39:52.855-05 | 5000 | -0.26 | -0.237 | 2500 | 17632625
> 592790 | 2012-11-19 13:39:52.855-05 | 5000 | 3.951 | 3.96 | 2500 | 17632626
> 595724 | 2012-11-19 13:39:52.855-05 | 7500 | -0.404 | -0.37 | 2500 | 17632627
> 592792 | 2012-11-19 13:39:52.855-05 | 7500 | 3.897 | 3.9 | 10000 | 17632628
> 592796 | 2012-11-19 13:39:52.855-05 | 2500 | 3.865 | 3.868 | 10000 | 17632629
> 592792 | 2012-11-19 13:39:52.855-05 | 10000 | 3.897 | 3.9 | 10000 | 17632630
> 592796 | 2012-11-19 13:39:52.855-05 | 5000 | 3.865 | 3.868 | 10000 | 17632631
> 592787 | 2012-11-19 13:39:52.856-05 | 10000 | 3.837 | 3.842 | 10000 | 17632632
> 592792 | 2012-11-19 13:39:52.856-05 | 2500 | 3.898 | 3.9 | 10000 | 17632633
> 592793 | 2012-11-19 13:39:52.856-05 | 10000 | 3.887 | 3.891 | 10000 | 17632634
> 592796 | 2012-11-19 13:39:52.856-05 | 7500 | 3.865 | 3.868 | 10000 | 17632635
> 592787 | 2012-11-19 13:39:52.856-05 | 12500 | 3.837 | 3.842 | 10000 | 17632636
> 592792 | 2012-11-19 13:39:52.856-05 | 5000 | 3.898 | 3.9 | 10000 | 17632637
> 592793 | 2012-11-19 13:39:52.856-05 | 12500 | 3.887 | 3.891 | 10000 | 17632638
> 592796 | 2012-11-19 13:39:52.856-05 | 10000 | 3.865 | 3.868 | 10000 | 17632639
> 592787 | 2012-11-19 13:39:52.856-05 | 2500 | 3.838 | 3.842 | 10000 | 17632640
>
>
> **********************************************************************
> This e-mail is confidential and for the individual or entity named above and may contain privileged information. If you are not the intended recipient, please notify us immediately by return e-mail or telephone and destroy this message. This communication is for informational purp...{{dropped:22}}
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
[[alternative HTML version deleted]]
_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
More information about the R-SIG-Finance
mailing list