[R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?

Kent Hoxsey khoxsey at gmail.com
Wed Nov 21 21:28:40 CET 2012


Ah, now it would appear you are asking simple R questions (rather than complicated finance questions) so I am going to point you toward some good sources. You will definitely want to do a little spelunking in the help, and there's a ton of good stuff (including examples) for working with time series on Stack Overflow (esp search for topics tagged with "r" and "xts").

Display of the number of digits is controlled by the options() function, read about it with help(options) and set it with something like: options(digits=10). You will want to read up on POSIXct(), particularly the "Sub-second Accuracy" section.

Your SQL  is most likely coming back into R as a dataframe (help(data.frame)) which you can look at using head() or tail() or str(). In combination with the digits settings, you will likely be able to see all of the digits you seek.

On Nov 21, 2012, at 12:22 PM, Michael Lyszczek wrote:

> I am now doing this : 
> 
> select cast(extract(epoch from timestamp) as float) from trades order by timestamp;
> 
> this gives me a list that looks like .....
> 1353381185.089
> 1353381185.089
>  1353381189.96
>  1353381189.96
>  1353381189.96
> 1353381190.353
> 1353381190.353
> 1353381190.353
> 1353381191.139
> 1353381191.139
> 1353381191.139
>  1353381256.04
> 1353381855.805
>  1353381864.52
> 1353384140.324
> 
> Etc....
> 
> I try to bring it into R the same way, so new query:
> 
> NGZ2zoo <- dbGetQuery(con, "select cast(extract(epoch from timestamp) as float), instrument_id, trade_size, trade_price, exchange_generated_price, trade_id, agressor_side_isbuy from trades where trades.instrument_id = (select CAST(max(securities.\"UnderlyingSymbol\") as INT) from securities where securities.\"FNYShortName\" = 'iNGZ2')");
> Summary(NGZ2zoo)
>   date_part        instrument_id      trade_size     trade_price  
> Min.   :1.35e+09   Min.   :584445   Min.   : 2500   Min.   :3708  
> 1st Qu.:1.35e+09   1st Qu.:584445   1st Qu.: 2500   1st Qu.:3742  
> Median :1.35e+09   Median :584445   Median : 2500   Median :3760  
> Mean   :1.35e+09   Mean   :584445   Mean   : 3070   Mean   :3759  
> 3rd Qu.:1.35e+09   3rd Qu.:584445   3rd Qu.: 2500   3rd Qu.:3776  
> Max.   :1.35e+09   Max.   :584445   Max.   :70000   Max.   :3833  
> exchange_generated_price    trade_id        agressor_side_isbuy
> Mode :logical            Min.   :65001155   Min.   :-1.000     
> FALSE:2761               1st Qu.:65767044   1st Qu.:-1.000     
> TRUE :474                Median :66313346   Median : 0.000     
> NA's :0                  Mean   :66306294   Mean   :-0.068     
>                          3rd Qu.:66880936   3rd Qu.: 1.000     
>                          Max.   :67377417   Max.   : 1.000     
> 
> as.POSIXct(NGZ2zoo$date_part, origin="1970-01-01")
> yields:
> [3199] "2012-11-20 02:12:29 EST" "2012-11-20 00:29:30 EST"
> [3201] "2012-11-20 02:15:27 EST" "2012-11-20 00:29:31 EST"
> [3203] "2012-11-20 00:29:31 EST" "2012-11-20 00:29:33 EST"
> [3205] "2012-11-20 02:15:28 EST" "2012-11-20 02:15:33 EST"
> [3207] "2012-11-20 02:15:33 EST" "2012-11-20 02:16:11 EST"
> [3209] "2012-11-20 02:18:21 EST" "2012-11-20 02:18:58 EST"
> [3211] "2012-11-20 00:29:33 EST" "2012-11-20 00:29:34 EST"
> [3213] "2012-11-20 00:29:40 EST" "2012-11-20 02:22:04 EST"
> [3215] "2012-11-20 00:29:43 EST" "2012-11-20 02:22:31 EST"
> [3217] "2012-11-20 02:22:33 EST" "2012-11-20 02:26:23 EST"
> [3219] "2012-11-20 02:26:23 EST" "2012-11-20 02:27:53 EST"
> [3221] "2012-11-20 02:29:30 EST" "2012-11-20 02:29:36 EST"
> [3223] "2012-11-20 02:29:39 EST" "2012-11-20 02:47:05 EST"
> [3225] "2012-11-20 02:47:05 EST" "2012-11-20 02:47:05 EST"
> [3227] "2012-11-20 02:51:25 EST" "2012-11-20 02:53:37 EST"
> [3229] "2012-11-20 02:53:44 EST" "2012-11-20 02:55:53 EST"
> [3231] "2012-11-20 02:58:20 EST" "2012-11-20 03:14:16 EST"
> [3233] "2012-11-20 03:24:15 EST" "2012-11-20 03:24:24 EST"
> [3235] "2012-11-20 04:02:20 EST"
> 
> 
> Even if I just do the system time:
>> as.POSIXct(Sys.time(),  format = "%Y-%m-%d %H:%M:%OS")
> [1] "2012-11-21 15:07:59 EST"
> 
> 
> Is it just stored as a float? Or numeric? 
> 
> Any way you can see the actual value, not scientific notation?
> 
> Thanks so much,
> 
> Mike
> 
> -----Original Message-----
> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Kent Hoxsey
> Sent: Wednesday, November 21, 2012 3:03 PM
> To: r-sig-finance at r-project.org
> Subject: Re: [R-SIG-Finance] Millisecond timestamp from PgSql using DBI / RPostgreSQL ? Anyway to set up a default timeseries format for the timestamp type?
> 
> After working through a similar set of issues a while back, I discovered this very useful web page:
> 
> http://www.epochconverter.com/
> 
> Short summary: use your RDBMS to do the conversion to epoch time, and hand that number to R. Do not write SQL to write text to parse in R.
> 
> In my experience, the task of converting from a string representation to POSIXct representation in R can be quite, erm, tedious when you're working with a lot of data. In contrast, doing the conversion in a source database (Oracle for me) is really fast, as are the downstream operations in R.
> 
> Caveat: I have not attempted to preserve or work with the millis, so you might need to do some prototyping to get your conversion right coming out Postgres, but that web site includes quite a bit of discussion about the topic.
> 
> HTH.
> On Nov 21, 2012, at 10:16 AM, Michael Lyszczek wrote:
> 
>> Hi,
>> 
>> I am new to R and have been really having a great time dealing with millisecond timestamps in financial data.
>> 
>> I can get them to work in R using read.csv() then then converting the time stamp column using  as.POSIXct(DATA$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
>> 
>> However, when I use DBI and PgSql to query a table for ticks / bbo I have no idea how to specify not to drop off the trailing millisecond.
>> <code>
>> 
>> options(digs.secs = 3);
>> 
>> library("DBI")
>> library("RPostgreSQL")
>> 
>> drv <- dbDriver("PostgreSQL")
>> con <- dbConnect(drv, dbname="market_data", host="host", user="user", password="password")
>> NGZ2 <- dbGetQuery(con, "select * from trades blah blah blah");
>> 
>> </code>
>> 
>> Summary(NGZ2) yields:
>>> summary(NGZ2zoo)
>> instrument_id    table_trade_id      trade_size     trade_price
>> Min.   :584445   Min.   :2786130   Min.   : 2500   Min.   :3708
>> 1st Qu.:584445   1st Qu.:2790335   1st Qu.: 2500   1st Qu.:3742
>> Median :584445   Median :2795493   Median : 2500   Median :3760
>> Mean   :584445   Mean   :2795617   Mean   : 3070   Mean   :3759
>> 3rd Qu.:584445   3rd Qu.:2800727   3rd Qu.: 2500   3rd Qu.:3776
>> Max.   :584445   Max.   :2806760   Max.   :70000   Max.   :3833
>> exchange_generated_price    trade_id          timestamp
>> Mode :logical            Min.   :65001155   Min.   :2012-11-19 00:49:07
>> FALSE:2761               1st Qu.:65767044   1st Qu.:2012-11-19 14:25:28
>> TRUE :474                Median :66313346   Median :2012-11-19 16:25:25
>> NA's :0                  Mean   :66306294   Mean   :2012-11-19 16:23:35
>>                         3rd Qu.:66880936   3rd Qu.:2012-11-19 18:50:35
>>                         Max.   :67377417   Max.   :2012-11-19 23:02:20
>> agressor_side_isbuy
>> Min.   :-1.00000
>> 1st Qu.:-1.00000
>> Median : 0.00000
>> Mean   :-0.06801
>> 3rd Qu.: 1.00000
>> Max.   : 1.00000
>> 
>> 
>> Meanwhile using a read.csv() for a similar dataset yields:
>> 
>> 
>> NGX2  <- read.csv("iNGCal1310-12-2012.csv", header=TRUE)
>> 
>> 
>> 
>>> summary(NGX2)
>> 
>>                  TIMESTAMP         BID_SIZE          BID
>> 
>> 2012-10-22 16:04:19.561:    12   Min.   : 2500   Min.   :3.795
>> 
>> 2012-10-22 13:21:33.533:    11   1st Qu.: 2500   1st Qu.:3.911
>> 
>> 2012-10-22 13:54:06.927:    11   Median : 2500   Median :3.927
>> 
>> 2012-10-22 14:44:43.331:    11   Mean   : 4356   Mean   :3.929
>> 
>> 2012-10-22 15:50:03.755:    11   3rd Qu.: 5000   3rd Qu.:3.943
>> 
>> 2012-10-22 16:09:08.757:    11   Max.   :15000   Max.   :3.989
>> 
>> (Other)                :119621
>> 
>>     ASK           ASK_SIZE
>> 
>> Min.   :3.904   Min.   : 2500
>> 
>> 1st Qu.:3.919   1st Qu.: 2500
>> 
>> Median :3.935   Median : 5000
>> 
>> Mean   :3.938   Mean   : 4667
>> 
>> 3rd Qu.:3.950   3rd Qu.: 7500
>> 
>> Max.   :4.007   Max.   :15000
>> 
>> 
>> 
>> After processing with as.PosixCT:
>> 
>> 
>> 
>>> NGX2$TS <-as.POSIXct(NGX2$TIMESTAMP, format = "%Y-%m-%d %H:%M:%OS");
>> 
>>> summary(NGX2)
>> 
>>                  TIMESTAMP         BID_SIZE          BID
>> 
>> 2012-10-22 16:04:19.561:    12   Min.   : 2500   Min.   :3.795
>> 
>> 2012-10-22 13:21:33.533:    11   1st Qu.: 2500   1st Qu.:3.911
>> 
>> 2012-10-22 13:54:06.927:    11   Median : 2500   Median :3.927
>> 
>> 2012-10-22 14:44:43.331:    11   Mean   : 4356   Mean   :3.929
>> 
>> 2012-10-22 15:50:03.755:    11   3rd Qu.: 5000   3rd Qu.:3.943
>> 
>> 2012-10-22 16:09:08.757:    11   Max.   :15000   Max.   :3.989
>> 
>> (Other)                :119621
>> 
>>     ASK           ASK_SIZE           TS
>> 
>> Min.   :3.904   Min.   : 2500   Min.   :2012-10-22 00:00:00
>> 
>> 1st Qu.:3.919   1st Qu.: 2500   1st Qu.:2012-10-22 13:59:36
>> 
>> Median :3.935   Median : 5000   Median :2012-10-22 14:56:57
>> 
>> Mean   :3.938   Mean   : 4667   Mean   :2012-10-22 14:52:21
>> 
>> 3rd Qu.:3.950   3rd Qu.: 7500   3rd Qu.:2012-10-22 16:04:16
>> 
>> Max.   :4.007   Max.   :15000   Max.   :2012-10-22 16:52:08
>> 
>> 
>> 
>> 
>> Is there any way to specify date format for a dbGetQuery () ?
>> 
>> 
>> Straight from SQL:
>> instrument_id |         timestamp          | best_bid_size | best_bid | best_offer | best_offer_size | data_id
>> ---------------+----------------------------+---------------+----------+------------+-----------------+----------
>>       592802 | 2012-11-19 13:39:52.855-05 |          5000 |    3.864 |      3.872 |            7500 | 17632624
>>      1461673 | 2012-11-19 13:39:52.855-05 |          5000 |    -0.26 |     -0.237 |            2500 | 17632625
>>       592790 | 2012-11-19 13:39:52.855-05 |          5000 |    3.951 |       3.96 |            2500 | 17632626
>>       595724 | 2012-11-19 13:39:52.855-05 |          7500 |   -0.404 |      -0.37 |            2500 | 17632627
>>       592792 | 2012-11-19 13:39:52.855-05 |          7500 |    3.897 |        3.9 |           10000 | 17632628
>>       592796 | 2012-11-19 13:39:52.855-05 |          2500 |    3.865 |      3.868 |           10000 | 17632629
>>       592792 | 2012-11-19 13:39:52.855-05 |         10000 |    3.897 |        3.9 |           10000 | 17632630
>>       592796 | 2012-11-19 13:39:52.855-05 |          5000 |    3.865 |      3.868 |           10000 | 17632631
>>       592787 | 2012-11-19 13:39:52.856-05 |         10000 |    3.837 |      3.842 |           10000 | 17632632
>>       592792 | 2012-11-19 13:39:52.856-05 |          2500 |    3.898 |        3.9 |           10000 | 17632633
>>       592793 | 2012-11-19 13:39:52.856-05 |         10000 |    3.887 |      3.891 |           10000 | 17632634
>>       592796 | 2012-11-19 13:39:52.856-05 |          7500 |    3.865 |      3.868 |           10000 | 17632635
>>       592787 | 2012-11-19 13:39:52.856-05 |         12500 |    3.837 |      3.842 |           10000 | 17632636
>>       592792 | 2012-11-19 13:39:52.856-05 |          5000 |    3.898 |        3.9 |           10000 | 17632637
>>       592793 | 2012-11-19 13:39:52.856-05 |         12500 |    3.887 |      3.891 |           10000 | 17632638
>>       592796 | 2012-11-19 13:39:52.856-05 |         10000 |    3.865 |      3.868 |           10000 | 17632639
>>       592787 | 2012-11-19 13:39:52.856-05 |          2500 |    3.838 |      3.842 |           10000 | 17632640
>> 
>> 
>> **********************************************************************
>> This e-mail is confidential and for the individual or entity named above and may contain privileged information. If you are not the intended recipient, please notify us immediately by return e-mail or telephone and destroy this message. This communication is for informational purp...{{dropped:22}}
>> 
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions should go.
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



More information about the R-SIG-Finance mailing list