[R-SIG-Finance] Efficiency of data.frame to xts conversion

Dirk Eddelbuettel edd at debian.org
Wed May 2 23:46:08 CEST 2012


On 2 May 2012 at 17:18, Gordon Erlebacher wrote:
| Hi everybody, 
| 
| I have a MySQL database with stock information (using beancounter: thanks
| Kirk). 

Interesting off-by-one error...

| I return 30 stocks from the database (data.frame format by default)
| I am converting the data.frame to xts format. In the code below, 
| 
| 
| "df"   is the data.frame returned from the database. 
| "data" is what I get after conversion of df to an xts object. 
| 
| 
| Here is the problem: 
| 
| Getting 30 stocks from the data.base and return a data.frame takes 0.14 seconds
| Including conversion to the more efficient xts structure, takes 2 seconds. 
| 
| If I had 3000 stocks, it is the difference between 14 sec and 3 min, which is
| quite large. 
| Is there a solution to this dilemma? Of course, I could recode the database
| access functions. 
| 
| One solution of course, is to convert to xts and save to a file, so that
| conversion is no longer necessary, and I might do that if nobody comes up with
| a more direct solution.

You are presumably getting killed by the as.Date which you don't need (I
think).  

With PostgreSQL, my preferred db backend, I know that the SQL date/datetime columns
come back as Date/Datetime in R.  Hence the timing increase it is not all
that big when I also convert to xts.

Twenty msec for the db access:

R> system.time(SPY <- dbGetQuery(con, "select date, day_close as price from stockprices where symbol='SPY' order by date"))
   user  system elapsed 
  0.020   0.000   0.039 
R> head(SPY)
        date   price
1 1993-01-29 37.8940
2 1993-02-01 38.1635
3 1993-02-02 38.2444
4 1993-02-03 38.6487
5 1993-02-04 38.8104
6 1993-02-05 38.7834

And around 100 msec for db access and xts conversion:

R> system.time({ SPY <- dbGetQuery(con, "select date, day_close as price from stockprices where symbol='SPY' order by date"); spy <- xts(SPY[,-1], order.by=SPY[,1]) })
   user  system elapsed 
  0.104   0.000   0.121 
R> head(spy)
              [,1]
1993-01-29 37.8940
1993-02-01 38.1635
1993-02-02 38.2444
1993-02-03 38.6487
1993-02-04 38.8104
1993-02-05 38.7834
R> 


To really know you should profile.  And none of this really is a finance
question yet...

Dirk

-- 
R/Finance 2012 Conference on May 11 and 12, 2012 at UIC in Chicago, IL
See agenda, registration details and more at http://www.RinFinance.com



More information about the R-SIG-Finance mailing list