[R-SIG-Finance] Unreliable Yahoo downloads, so possibly save/reread using MySQL?

Cedrick Johnson cjohnson at backstopsolutions.com
Fri Dec 6 01:59:28 CET 2013


Mark-
Here's what I do in a automated fashion for downloading data from Yahoo post close (and storing into a mysql DB). The code may be a bit crummy (I cringe at my ALTER TABLE query but it works well for the 100 or so things I download every evening) but this should be a good starting point for refining. 

##EOD Downloader file, triggered by cron nightly after 8pm.. Define your symbols in a vector
library(quantmod)
library(RMySQL)

MySymbols = c("MRK","GOOG","AAPL")

getSymbols(MySymbols, src="yahoo")
dbc = dbConnect(MySQL(), user="youruser", pass="yourpass", host="yourhost", dbname="yourdb")
for(I in 1:length(MySymbols) {
	dbWriteTable(dbc, name=MySymbols[i],value=as.data.frame(get(MySymbols[i])), overwrite=TRUE)
	res = dbSendQuery(dbc, paste("ALTER TABLE `yourtable`.`",MySymbols[i],
    		"` CHANGE COLUMN `row_names` `Date` DATE NULL DEFAULT NULL,
		CHANGE COLUMN `", MySymbols [i],"_Open` `o` DOUBLE NULL DEFAULT NULL,
		CHANGE COLUMN `", MySymbols [i],"_High` `h` DOUBLE NULL DEFAULT NULL,
		CHANGE COLUMN `", MySymbols [i], "_Low` `l` DOUBLE NULL DEFAULT NULL,
		CHANGE COLUMN `", MySymbols [i], "_Close` `c` DOUBLE NULL DEFAULT NULL,
		CHANGE COLUMN `", MySymbols [i], "_Volume` `v` DOUBLE NULL DEFAULT NULL, 
		CHANGE COLUMN `",MySymbols[i],"_Adjusted` `a` DOUBLE NULL DEFAULT;", sep=""))
	dbClearResult(res)
}

Now, from quantmod:
setDefaults(getSymbols.MySQL, user="youruser", password="yourpass", host="yourhost",port=3306, dbname="yourdb")

getSymbols("MRK", src="MySQL")


Hope this helps,
CJ

-----Original Message-----
From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Mark Knecht
Sent: Thursday, December 05, 2013 1:55 PM
To: R-Finance
Subject: [R-SIG-Finance] Unreliable Yahoo downloads, so possibly save/reread using MySQL?

Hi,
   I read an ETF rotation paper a few weeks ago on Seeking Alpha and thought I'd play with the ideas in R. Without pointing any fingers (at
Yahoo) I find that downloading adjusted data to be highly unreliable, especially in the middle of the work day. The following code snippet is, at the time I'm writing this, failing on one or another of the 5 downloads nearly 100% of the time. It's not always the same symbols that fail. That seems random to me. However if I change MyAdjust to FALSE then the downloads seem to work perfectly. My experience is that after the market closes the adjusted data works most of the time.

   The sort of failure I see looks like the following (which did not use a to=Date_End) just as a test. In this case it was VNQ/REM that failed. I checked the Yahoo site and was able to see the data by hand:

> getSymbols(MySymbols[[1]], from = Date_Start, adjust = MyAdjust, src = 
> SymbolSrc)
[1] "VTI" "VEU"
> getSymbols(MySymbols[[2]], from = Date_Start, adjust = MyAdjust, src = 
> SymbolSrc)
[1] "HYG" "CIU"
> getSymbols(MySymbols[[3]], from = Date_Start, adjust = MyAdjust, src = 
> SymbolSrc)
Error in download.file(paste(yahoo.URL, Symbol.name, "&a=", from.m, "&b=",  :
  cannot open URL
'http://ichart.finance.yahoo.com/x?s=VNQ&a=0&b=01&c=2000&d=11&e=05&f=2013&g=v&y=0&z=30000'
In addition: Warning message:
In download.file(paste(yahoo.URL, Symbol.name, "&a=", from.m, "&b=",  :
  cannot open: HTTP status was '0 (null)'
> getSymbols(MySymbols[[4]], from = Date_Start, adjust = MyAdjust, src = 
> SymbolSrc)
[1] "GLD" "TLT"
> getSymbols(MySymbols[[5]], from = Date_Start, adjust = MyAdjust, src = 
> SymbolSrc)
[1] "SHY"

   Anyway, to implement my reading of this paper, I really need adjusted data to take dividends into account and it seems silly to me to keep downloading the same data over and over from Yahoo as I play with this R code. To that end I'm wondering if anyone has a code example for saving Yahoo data once to MySQL and then (presumably) changing what I call SymbolSrc=MySQL for subsequent runs when I work on the R code? I've been searching around in StackOverflow as well as Google and haven't yet found anything very executable.

   Please note that I'm a retired investor, not a trader, and seldom work with data faster than 30 minute bars and even then almost never real-time data, at least in terms of what I would store in MySQL so I'm not overly concerned right now as to whether MySQL is fast or slow. I already run MySQL for a couple of other reasons on my Gentoo box where I'm working in R and would prefer not run another database right now unless _necessary_.

   Clearly, if there's some stupid issue in my code that fixes this that would be great to know about also! :-)

Thanks in advance,
Mark



library(quantmod)

Date_Start = "2000-01-01"
Date_End = format(Sys.time(), "%Y-%m-%d") Sys.setenv(TZ = "UTC") SymbolSrc = "yahoo"
MyAdjust = TRUE

MySymbols = list()
MySymbols[[1]]= c("VTI","VEU")
MySymbols[[2]]= c("HYG","CIU")
MySymbols[[3]]= c("VNQ","REM")
MySymbols[[4]]= c("GLD","TLT")
MySymbols[[5]]= c("SHY")

getSymbols(MySymbols[[1]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[2]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[3]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[4]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[5]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc)

_______________________________________________
R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.



More information about the R-SIG-Finance mailing list