[R-SIG-Finance] Keeping persistent data collections

G See gsee000 at gmail.com
Mon Nov 7 13:47:11 CET 2011


I do what Brian described, and I use a couple functions from
FinancialInstrument to do it.

library(FinancialInstrument)
?saveSymbols.days
?saveSymbols.common
?getSymbols.FI

(I just noticed that those 2 saveSymbols.* functions do not allow for a
data extension other
than the old .rda.  I will probably update that today.)

I put together a little example, which I'll attach as well as paste below.

This is how I do it, but I certainly encourage suggestions for improvement.

HTH,
Garrett

> library(FinancialInstrument)

> # object with daily periodicity
> data(sample_matrix)
> DDD <- as.xts(sample_matrix)

> #object with minute periodicity
> AAA <- xts(rnorm(1:10000), Sys.time()-(60*1:10000))
> AAA <- align.time(AAA)
> colnames(AAA) <- "AAA"

> # look at the objects we're going to store
> head(AAA)
                            AAA
2011-10-31 09:04:00  0.05152989
2011-10-31 09:05:00  0.12797379
2011-10-31 09:06:00  0.96025183
2011-10-31 09:07:00 -0.23265907
2011-10-31 09:08:00  1.77706849
2011-10-31 09:09:00 -1.29139344

> head(DDD)
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185

> mydir <- getwd()
> saveSymbols.days("AAA", base_dir=mydir)
> saveSymbols.common("DDD", base_dir=mydir)

> # now that they are on disk,
> # remove them from workspace
> rm("AAA", "DDD")

> # get from disk
> getSymbols("AAA", src='FI', dir=mydir, split_method='days',
from='2011-10-31')
[1] "AAA"

> getSymbols("DDD", src='FI', dir=mydir, split_method='common')
[1] "DDD"

> head(AAA)
                            AAA
2011-10-31 09:04:00  0.05152989
2011-10-31 09:05:00  0.12797379
2011-10-31 09:06:00  0.96025183
2011-10-31 09:07:00 -0.23265907
2011-10-31 09:08:00  1.77706849
2011-10-31 09:09:00 -1.29139344

> head(DDD)
               Open     High      Low    Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185

> #--------
> # You can setSymbolLookup so that getSymbols will know where
> # to look.  There are 2 ways to setSymbolLookup: explicitly,
> # or by setting the "src" field of an instrument.
>
> # explicitly
> setSymbolLookup(DDD=list(src='FI', dir=mydir, split_method='common'))

> getSymbols("DDD")
[1] "DDD"

> # by using the "src" field of an instrument
> stock("AAA", currency("USD"), src=list(src='FI', dir=mydir,
split_method='days'))
[1] "AAA"

> getSymbols("AAA", from='2011-10-31')
[1] "AAA"

> # cleanup
> rm("AAA", "DDD")
> unlink("AAA", recursive=TRUE)
> unlink("DDD", recursive=TRUE)



On Mon, Nov 7, 2011 at 3:20 AM, Brian G. Peterson <brian at braverock.com>wrote:

> On Sun, 2011-11-06 at 22:43 -0500, Dino Veritas wrote:
> > Hello, I recently found this list and have been reading deeply the
> > archives. I am wondering how people here maintain their collections of
> data
> > for easy use in R. I am wondering a few things:
> >
> > 1) How do members of this list deal with keeping persistent data
> > collections with R? I was thinking of individual xts objects by asset and
> > frequency (such as AAPL daily, AAPL minute, AAPL 60m, etc). While I can
> > store and maintain these xts objects on disk and load them into R as
> > needed, I am wondering if there is a more better solution.
>
> I store only tick data, as I can easily get to any other frequency from
> tick.  I've considered also storing daily data, but in the end I decide
> it is too much trouble to (additionally) manage, and just store tick.
>
> > 2) Coming from that, I have been looking into the indexing package for my
> > needs. It seems very useful for managing a lot of large data sets in
> > memory, but I am not sure it is a good method for maintaining persistent
> > data, I have found trouble adding information to existing data that is
> > indexed on disk. Do poster here use indexing for this purpose? I did find
> > an old post or two touching on that with no specifics. I would like to be
> > able to combine the ability of indexing to have many large data sets
> > available in memory with persistent storage of data. Has anyone any
> > experience doing this?
>
> You are correct that the 'indexing' package is very powerful.  It is
> also not done yet.
>
> As I said, I store tick data.  The way I do this is with single files
> per day of data per symbol, pre-parsed into xts objects and stored to
> disk in one directory per symbol (using 'save').
>
> I then use FinancialInstrument to keep track of all the instrument
> metadata, and getSymbols to load the data into R when I need it (and
> over the time-frames that I require).  We currently download tick data
> for about 2500 tradeable instruments per day, and maintain archives
> going back several years.  We have the .instrument environment stored on
> the same file server as the data, and every .Rprofile in the firm points
> to this so that everyone has access to getInstrument and getSymbols
>
> I know someone who works in the hedge fund industry, mostly with monthly
> data, with some daily data sprinkled in.  He uses the same approach I
> have outlined of storing the metadata in FinancialInstrument, and
> getSymbols to access the data.  He typically stores one consolidated CSV
> file per instrument, because CSV files are easy to add on to with a
> batch process.
>
> For lower frequency data (let's say daily or lower) a database is
> certainly an option, and there are getSymbols wrappers that could be
> adapted to whatever schema you decided to use. Obviously, there are tick
> data database providers such as OneTick and kdb, and if you have this
> problem and the resources to need this type of solution, you probably
> already know that you are in this camp, and know that these providers
> have R interfaces of varying quality.
>
> The FinancialInstrument package has a 'parsers' directory included in
> the 'inst' directory of the package with many examples of download and
> parse routines for regular loading of data from a variety of free or
> subscription providers.  This should give you a lot of material to begin
> working with your own data providers.
>
> > 3) How do people keep track of all the data sets within R? Are there any
> > useful packages for keeping track of multiple sets of financial data and
> > the information about them?
>
> We wrote and use FinancialInstrument for this purpose.
>
> As I said earlier, I see no value in storing different periodicities,
> and store only tick.
>
> One of the reasons that I chose to write a getSymbols wrapper for
> retrieving our tick data stores is that resources like this list have
> extensive experience about using getSymbols, and it is therefore easy
> for people at our firm to become familiar with using the data.
>
> Also, I am reasonably confident that as the indexing package matures,
> there will be a getSymbols method for it as well, and if appropriate I
> can easily convert all my data in one batch pass and it will be
> transparent to my users.
>
> I made what I now realize to have been a mistake at a previous firm in
> writing a data retrieval function that was not compatible with
> getSymbols which was more complex to teach people how to use it, and
> less compatible with huge amounts of other publicly available code.
>
> quantmod and FinancialInstrument contain examples of various getSymbols
> methods that may meet your needs, or that could serve as templates for
> your custom in-house data source.
>
> > 4) Any other pointers? I know many here are well versed and manage large
> > data sets with R. Any tips you have or even simply showing me in a
> helpful
> > direction to useful packages you use is great. This list is a great help
> > for me and I am still browsing old threads!
>
> Regards,
>
>    - Brian
>
> --
> Brian G. Peterson
> http://braverock.com/brian/
> Ph: 773-459-4973
> IM: bgpbraverock
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/dc8ee49e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rdatasaving.R
Type: text/x-r-source
Size: 1208 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/dc8ee49e/attachment.bin>


More information about the R-SIG-Finance mailing list