[R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?

Michael Lyszczek m.lyszczek at firstny.com
Wed Nov 21 21:22:47 CET 2012


Kent,

Thanks for the response.

I am now doing this : 

select cast(extract(epoch from timestamp) as float) from trades order by timestamp;

this gives me a list that looks like .....
1353381185.089
 1353381185.089
  1353381189.96
  1353381189.96
  1353381189.96
 1353381190.353
 1353381190.353
 1353381190.353
 1353381191.139
 1353381191.139
 1353381191.139
  1353381256.04
 1353381855.805
  1353381864.52
 1353384140.324

Etc....

I try to bring it into R the same way, so new query:

NGZ2zoo <- dbGetQuery(con, "select cast(extract(epoch from timestamp) as float), instrument_id, trade_size, trade_price, exchange_generated_price, trade_id, agressor_side_isbuy from trades where trades.instrument_id = (select CAST(max(securities.\"UnderlyingSymbol\") as INT) from securities where securities.\"FNYShortName\" = 'iNGZ2')");
Summary(NGZ2zoo)
   date_part        instrument_id      trade_size     trade_price  
 Min.   :1.35e+09   Min.   :584445   Min.   : 2500   Min.   :3708  
 1st Qu.:1.35e+09   1st Qu.:584445   1st Qu.: 2500   1st Qu.:3742  
 Median :1.35e+09   Median :584445   Median : 2500   Median :3760  
 Mean   :1.35e+09   Mean   :584445   Mean   : 3070   Mean   :3759  
 3rd Qu.:1.35e+09   3rd Qu.:584445   3rd Qu.: 2500   3rd Qu.:3776  
 Max.   :1.35e+09   Max.   :584445   Max.   :70000   Max.   :3833  
 exchange_generated_price    trade_id        agressor_side_isbuy
 Mode :logical            Min.   :65001155   Min.   :-1.000     
 FALSE:2761               1st Qu.:65767044   1st Qu.:-1.000     
 TRUE :474                Median :66313346   Median : 0.000     
 NA's :0                  Mean   :66306294   Mean   :-0.068     
                          3rd Qu.:66880936   3rd Qu.: 1.000     
                          Max.   :67377417   Max.   : 1.000     

as.POSIXct(NGZ2zoo$date_part, origin="1970-01-01")
yields:
[3199] "2012-11-20 02:12:29 EST" "2012-11-20 00:29:30 EST"
[3201] "2012-11-20 02:15:27 EST" "2012-11-20 00:29:31 EST"
[3203] "2012-11-20 00:29:31 EST" "2012-11-20 00:29:33 EST"
[3205] "2012-11-20 02:15:28 EST" "2012-11-20 02:15:33 EST"
[3207] "2012-11-20 02:15:33 EST" "2012-11-20 02:16:11 EST"
[3209] "2012-11-20 02:18:21 EST" "2012-11-20 02:18:58 EST"
[3211] "2012-11-20 00:29:33 EST" "2012-11-20 00:29:34 EST"
[3213] "2012-11-20 00:29:40 EST" "2012-11-20 02:22:04 EST"
[3215] "2012-11-20 00:29:43 EST" "2012-11-20 02:22:31 EST"
[3217] "2012-11-20 02:22:33 EST" "2012-11-20 02:26:23 EST"
[3219] "2012-11-20 02:26:23 EST" "2012-11-20 02:27:53 EST"
[3221] "2012-11-20 02:29:30 EST" "2012-11-20 02:29:36 EST"
[3223] "2012-11-20 02:29:39 EST" "2012-11-20 02:47:05 EST"
[3225] "2012-11-20 02:47:05 EST" "2012-11-20 02:47:05 EST"
[3227] "2012-11-20 02:51:25 EST" "2012-11-20 02:53:37 EST"
[3229] "2012-11-20 02:53:44 EST" "2012-11-20 02:55:53 EST"
[3231] "2012-11-20 02:58:20 EST" "2012-11-20 03:14:16 EST"
[3233] "2012-11-20 03:24:15 EST" "2012-11-20 03:24:24 EST"
[3235] "2012-11-20 04:02:20 EST"


Even if I just do the system time:
> as.POSIXct(Sys.time(),  format = "%Y-%m-%d %H:%M:%OS")
[1] "2012-11-21 15:07:59 EST"


Is it just stored as a float? Or numeric? 

Any way you can see the actual value, not scientific notation?

Thanks so much,

Mike

-----Original Message-----
From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Kent Hoxsey
Sent: Wednesday, November 21, 2012 3:03 PM
To: r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?

After working through a similar set of issues a while back, I discovered this very useful web page:

http://www.epochconverter.com/

Short summary: use your RDBMS to do the conversion to epoch time, and hand that number to R. Do not write SQL to write text to parse in R.

In my experience, the task of converting from a string representation to POSIXct representation in R can be quite, erm, tedious when you're working with a lot of data. In contrast, doing the conversion in a source database (Oracle for me) is really fast, as are the downstream operations in R.

Caveat: I have not attempted to preserve or work with the millis, so you might need to do some prototyping to get your conversion right coming out Postgres, but that web site includes quite a bit of discussion about the topic.

HTH.
On Nov 21, 2012, at 10:16 AM, Michael Lyszczek wrote:

> Hi,
> 
> I am new to R and have been really having a great time dealing with millisecond timestamps in financial data.
> 
> I can get them to work in R using read.csv() then then converting the time stamp column using  as.POSIXct(DATA$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
> 
> However, when I use DBI and PgSql to query a table for ticks / bbo I have no idea how to specify not to drop off the trailing millisecond.
> <code>
> 
> options(digs.secs = 3);
> 
> library("DBI")
> library("RPostgreSQL")
> 
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, dbname="market_data", host="host", user="user", password="password")
> NGZ2 <- dbGetQuery(con, "select * from trades blah blah blah");
> 
> </code>
> 
> Summary(NGZ2) yields:
>> summary(NGZ2zoo)
> instrument_id    table_trade_id      trade_size     trade_price
> Min.   :584445   Min.   :2786130   Min.   : 2500   Min.   :3708
> 1st Qu.:584445   1st Qu.:2790335   1st Qu.: 2500   1st Qu.:3742
> Median :584445   Median :2795493   Median : 2500   Median :3760
> Mean   :584445   Mean   :2795617   Mean   : 3070   Mean   :3759
> 3rd Qu.:584445   3rd Qu.:2800727   3rd Qu.: 2500   3rd Qu.:3776
> Max.   :584445   Max.   :2806760   Max.   :70000   Max.   :3833
> exchange_generated_price    trade_id          timestamp
> Mode :logical            Min.   :65001155   Min.   :2012-11-19 00:49:07
> FALSE:2761               1st Qu.:65767044   1st Qu.:2012-11-19 14:25:28
> TRUE :474                Median :66313346   Median :2012-11-19 16:25:25
> NA's :0                  Mean   :66306294   Mean   :2012-11-19 16:23:35
>                          3rd Qu.:66880936   3rd Qu.:2012-11-19 18:50:35
>                          Max.   :67377417   Max.   :2012-11-19 23:02:20
> agressor_side_isbuy
> Min.   :-1.00000
> 1st Qu.:-1.00000
> Median : 0.00000
> Mean   :-0.06801
> 3rd Qu.: 1.00000
> Max.   : 1.00000
> 
> 
> Meanwhile using a read.csv() for a similar dataset yields:
> 
> 
> NGX2  <- read.csv("iNGCal1310-12-2012.csv", header=TRUE)
> 
> 
> 
>> summary(NGX2)
> 
>                   TIMESTAMP         BID_SIZE          BID
> 
> 2012-10-22 16:04:19.561:    12   Min.   : 2500   Min.   :3.795
> 
> 2012-10-22 13:21:33.533:    11   1st Qu.: 2500   1st Qu.:3.911
> 
> 2012-10-22 13:54:06.927:    11   Median : 2500   Median :3.927
> 
> 2012-10-22 14:44:43.331:    11   Mean   : 4356   Mean   :3.929
> 
> 2012-10-22 15:50:03.755:    11   3rd Qu.: 5000   3rd Qu.:3.943
> 
> 2012-10-22 16:09:08.757:    11   Max.   :15000   Max.   :3.989
> 
> (Other)                :119621
> 
>      ASK           ASK_SIZE
> 
> Min.   :3.904   Min.   : 2500
> 
> 1st Qu.:3.919   1st Qu.: 2500
> 
> Median :3.935   Median : 5000
> 
> Mean   :3.938   Mean   : 4667
> 
> 3rd Qu.:3.950   3rd Qu.: 7500
> 
> Max.   :4.007   Max.   :15000
> 
> 
> 
> After processing with as.PosixCT:
> 
> 
> 
>> NGX2$TS <-as.POSIXct(NGX2$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
> 
>> summary(NGX2)
> 
>                   TIMESTAMP         BID_SIZE          BID
> 
> 2012-10-22 16:04:19.561:    12   Min.   : 2500   Min.   :3.795
> 
> 2012-10-22 13:21:33.533:    11   1st Qu.: 2500   1st Qu.:3.911
> 
> 2012-10-22 13:54:06.927:    11   Median : 2500   Median :3.927
> 
> 2012-10-22 14:44:43.331:    11   Mean   : 4356   Mean   :3.929
> 
> 2012-10-22 15:50:03.755:    11   3rd Qu.: 5000   3rd Qu.:3.943
> 
> 2012-10-22 16:09:08.757:    11   Max.   :15000   Max.   :3.989
> 
> (Other)                :119621
> 
>      ASK           ASK_SIZE           TS
> 
> Min.   :3.904   Min.   : 2500   Min.   :2012-10-22 00:00:00
> 
> 1st Qu.:3.919   1st Qu.: 2500   1st Qu.:2012-10-22 13:59:36
> 
> Median :3.935   Median : 5000   Median :2012-10-22 14:56:57
> 
> Mean   :3.938   Mean   : 4667   Mean   :2012-10-22 14:52:21
> 
> 3rd Qu.:3.950   3rd Qu.: 7500   3rd Qu.:2012-10-22 16:04:16
> 
> Max.   :4.007   Max.   :15000   Max.   :2012-10-22 16:52:08
> 
> 
> 
> 
> Is there any way to specify date format for a dbGetQuery () ?
> 
> 
> Straight from SQL:
> instrument_id |         timestamp          | best_bid_size | best_bid | best_offer | best_offer_size | data_id
> ---------------+----------------------------+---------------+----------+------------+-----------------+----------
>        592802 | 2012-11-19 13:39:52.855-05 |          5000 |    3.864 |      3.872 |            7500 | 17632624
>       1461673 | 2012-11-19 13:39:52.855-05 |          5000 |    -0.26 |     -0.237 |            2500 | 17632625
>        592790 | 2012-11-19 13:39:52.855-05 |          5000 |    3.951 |       3.96 |            2500 | 17632626
>        595724 | 2012-11-19 13:39:52.855-05 |          7500 |   -0.404 |      -0.37 |            2500 | 17632627
>        592792 | 2012-11-19 13:39:52.855-05 |          7500 |    3.897 |        3.9 |           10000 | 17632628
>        592796 | 2012-11-19 13:39:52.855-05 |          2500 |    3.865 |      3.868 |           10000 | 17632629
>        592792 | 2012-11-19 13:39:52.855-05 |         10000 |    3.897 |        3.9 |           10000 | 17632630
>        592796 | 2012-11-19 13:39:52.855-05 |          5000 |    3.865 |      3.868 |           10000 | 17632631
>        592787 | 2012-11-19 13:39:52.856-05 |         10000 |    3.837 |      3.842 |           10000 | 17632632
>        592792 | 2012-11-19 13:39:52.856-05 |          2500 |    3.898 |        3.9 |           10000 | 17632633
>        592793 | 2012-11-19 13:39:52.856-05 |         10000 |    3.887 |      3.891 |           10000 | 17632634
>        592796 | 2012-11-19 13:39:52.856-05 |          7500 |    3.865 |      3.868 |           10000 | 17632635
>        592787 | 2012-11-19 13:39:52.856-05 |         12500 |    3.837 |      3.842 |           10000 | 17632636
>        592792 | 2012-11-19 13:39:52.856-05 |          5000 |    3.898 |        3.9 |           10000 | 17632637
>        592793 | 2012-11-19 13:39:52.856-05 |         12500 |    3.887 |      3.891 |           10000 | 17632638
>        592796 | 2012-11-19 13:39:52.856-05 |         10000 |    3.865 |      3.868 |           10000 | 17632639
>        592787 | 2012-11-19 13:39:52.856-05 |          2500 |    3.838 |      3.842 |           10000 | 17632640
> 
> 
> **********************************************************************
> This e-mail is confidential and for the individual or entity named above and may contain privileged information. If you are not the intended recipient, please notify us immediately by return e-mail or telephone and destroy this message. This communication is for informational purp...{{dropped:22}}
> 
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.


	[[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.



More information about the R-SIG-Finance mailing list