[R-SIG-Finance] Older financials?
Rex Macey
rex at macey.us
Sat Nov 28 18:20:29 CET 2015
A suggestion on where to get extensive fundamental data cheaply.
This is a response to Mark's Nov 23rd message.
Consider data from the American Association of Individual Investor’s
Stock Investor Pro (SIP) software. I’ve had a lifetime membership to the
AAII for many years. For the additional, but more than reasonable price
of $198/yr, one can license SIP. What makes this source valuable is that
it is survivorship-bias free historical data. Subscribers have access to
the old software and data as it was when it was distributed going back
to 2003. The data include balance sheet, income statement, cash flow,
price, and many calculated fields. The list of fields
<https://www.aaii.com/files/sipro/Stock%20Investor%20Pro%20Field%20List.pdf> runs
to 22 pages. In 2003, over 8,500 companies were covered.
For info on SIP, check out the AAII
<file:///C:/Users/Rex/Documents/Quant%20Trading/SMW/www.aaii.com> webpage and
this presentation
<http://www.aaii.com/files/presentations/2011/20%20Joe%20Lan%20-%20Introduction%20to%20Stock%20Investor%20Pro.pdf>.
I downloaded about 150 install files from the AAII archives
<http://www.aaii.com/stock-investor-pro/archives> page site access to
which requires membership ($29) and a subscription. I installed them one
by one putting each into its own directory. I downloaded the month-end
updates though weekly data was sometime available. I watched an entire
season of Friends while doing this and probably lost three IQ points.
Each install includes about 7 years of annual data and 8 quarters of
quarterly data.
The AAII data files are in a Foxpro/DBF format. Fortunately R has the
read.dbf
<https://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dbf.html> function
in the foreign package to handle this.
Let me emphasize that this data is (almost) free of survivor-ship and
look-ahead biases. You are getting the data as SIP released it back in
the day. So companies around in 2003 that are not are in the data set.
The data only has data that was available at the time of the release, so
there is no look-ahead problem. I added "almost" to cover 2 caveats.
As an example, if you use the first install (end of 2002) to figure out
companies with P/E's less than X in 2001, you've got a survivor-ship
bias problem. The SIP data is available pretty much at month end, but
you won't be able to trade at month-end. If you assume that you can,
you have a look-ahead bias.
Weekly data is available beginning in 2005.
I hope this helps. If you use SIP and find data errors, I'd like to
know about them.
[[alternative HTML version deleted]]
More information about the R-SIG-Finance
mailing list