[R-SIG-Finance] Unreliable Yahoo downloads, so possibly save/reread using MySQL?

Mark Knecht markknecht at gmail.com
Wed Dec 18 19:28:32 CET 2013


Hi Cedric,
   I wanted to once again say thanks. I'm now downloading data into
MySQL nightly using cron and I wouldn't have gotten there without your
example. There's a still a few curiousities:

1) Why can't I get setDefaults(getSymbols.MySQL to use .my.cnf instead
of having to put a password in this example
2) Why does getSymbols reading MySQL require
c("o","h","l","c","v","a") instead of the names that dbWriteTable put
in?

but other than that it all works.

   Like you I cringed about that big ALTER TABLE command. I've not
figured out how to get rid of the row_names to Date rename but the
rest of it is easy to do in a colname rename as shown here. I think it
makes the code more readable, at least to my eyes. I broke the ALTER
TABLE stuff into multiple strings as I was studying the problem so
that I could run multiple ALTER TABLE commands more easily but in the
end ended up with only the one and didn't simplify here.

   I also put in a rm(list=ls()) command to ensure what I was seeing
was really coming from MySQL.

   It also seems possible to use dbReadTable instead of getSymbols.
That might be another option for fewer ALTER requirements.

   Anyway, thanks to you and to Mark Breman for your help.

Cheers,
Mark



##EOD Downloader file, triggered by cron nightly after 8pm.. Define
your symbols in a vector
library(quantmod)
library(RMySQL)

MySymbols = c("MRK","GOOG","AAPL")
dbname = "Cedric"
getSymbols(MySymbols, src="yahoo")


dbc = dbConnect(MySQL(), dbname=dbname)
for(i in 1:length(MySymbols)) {
  df1 = as.data.frame(get(MySymbols[i]))
  #Change column names in database of getSymbols.MySQL won't work...
  #Simplifies ALTER TABLE command later
  colnames(df1) = c("o","h","l","c","v","a")
  dbWriteTable(dbc, name=MySymbols[i],value=df1, overwrite=TRUE)

  Str1 = paste0("ALTER TABLE ", MySymbols[i])
  Str2 = paste0("CHANGE COLUMN row_names Date DATE NULL DEFAULT NULL")
  StrOut = paste(Str1, Str2)
  print(StrOut)
  res = dbSendQuery(dbc, StrOut)
  dbClearResult(res)
}

#As a test, get rid of everything so far to
#ensure data read back is from MySQL
Sys.sleep(2)
rm(list=ls())
Sys.sleep(2)

#Now, from quantmod:
#setDefaults(getSymbols.MySQL, dbname="Cedric")
setDefaults(getSymbols.MySQL, user="mark", password="password", dbname="Cedric")

MySymbols = c("MRK","GOOG","AAPL")

for (i in 1:(length(MySymbols))){
  getSymbols(MySymbols[i], src="MySQL")
}

On Thu, Dec 5, 2013 at 4:59 PM, Cedrick Johnson
<cjohnson at backstopsolutions.com> wrote:
> Mark-
> Here's what I do in a automated fashion for downloading data from Yahoo post close (and storing into a mysql DB). The code may be a bit crummy (I cringe at my ALTER TABLE query but it works well for the 100 or so things I download every evening) but this should be a good starting point for refining.
>
> ##EOD Downloader file, triggered by cron nightly after 8pm.. Define your symbols in a vector
> library(quantmod)
> library(RMySQL)
>
> MySymbols = c("MRK","GOOG","AAPL")
>
> getSymbols(MySymbols, src="yahoo")
> dbc = dbConnect(MySQL(), user="youruser", pass="yourpass", host="yourhost", dbname="yourdb")
> for(I in 1:length(MySymbols) {
>         dbWriteTable(dbc, name=MySymbols[i],value=as.data.frame(get(MySymbols[i])), overwrite=TRUE)
>         res = dbSendQuery(dbc, paste("ALTER TABLE `yourtable`.`",MySymbols[i],
>                 "` CHANGE COLUMN `row_names` `Date` DATE NULL DEFAULT NULL,
>                 CHANGE COLUMN `", MySymbols [i],"_Open` `o` DOUBLE NULL DEFAULT NULL,
>                 CHANGE COLUMN `", MySymbols [i],"_High` `h` DOUBLE NULL DEFAULT NULL,
>                 CHANGE COLUMN `", MySymbols [i], "_Low` `l` DOUBLE NULL DEFAULT NULL,
>                 CHANGE COLUMN `", MySymbols [i], "_Close` `c` DOUBLE NULL DEFAULT NULL,
>                 CHANGE COLUMN `", MySymbols [i], "_Volume` `v` DOUBLE NULL DEFAULT NULL,
>                 CHANGE COLUMN `",MySymbols[i],"_Adjusted` `a` DOUBLE NULL DEFAULT;", sep=""))
>         dbClearResult(res)
> }
>
> Now, from quantmod:
> setDefaults(getSymbols.MySQL, user="youruser", password="yourpass", host="yourhost",port=3306, dbname="yourdb")
>
> getSymbols("MRK", src="MySQL")
>
>
> Hope this helps,
> CJ
>
> -----Original Message-----
> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Mark Knecht
> Sent: Thursday, December 05, 2013 1:55 PM
> To: R-Finance
> Subject: [R-SIG-Finance] Unreliable Yahoo downloads, so possibly save/reread using MySQL?
>
> Hi,
>    I read an ETF rotation paper a few weeks ago on Seeking Alpha and thought I'd play with the ideas in R. Without pointing any fingers (at
> Yahoo) I find that downloading adjusted data to be highly unreliable, especially in the middle of the work day. The following code snippet is, at the time I'm writing this, failing on one or another of the 5 downloads nearly 100% of the time. It's not always the same symbols that fail. That seems random to me. However if I change MyAdjust to FALSE then the downloads seem to work perfectly. My experience is that after the market closes the adjusted data works most of the time.
>
>    The sort of failure I see looks like the following (which did not use a to=Date_End) just as a test. In this case it was VNQ/REM that failed. I checked the Yahoo site and was able to see the data by hand:
>
>> getSymbols(MySymbols[[1]], from = Date_Start, adjust = MyAdjust, src =
>> SymbolSrc)
> [1] "VTI" "VEU"
>> getSymbols(MySymbols[[2]], from = Date_Start, adjust = MyAdjust, src =
>> SymbolSrc)
> [1] "HYG" "CIU"
>> getSymbols(MySymbols[[3]], from = Date_Start, adjust = MyAdjust, src =
>> SymbolSrc)
> Error in download.file(paste(yahoo.URL, Symbol.name, "&a=", from.m, "&b=",  :
>   cannot open URL
> 'http://ichart.finance.yahoo.com/x?s=VNQ&a=0&b=01&c=2000&d=11&e=05&f=2013&g=v&y=0&z=30000'
> In addition: Warning message:
> In download.file(paste(yahoo.URL, Symbol.name, "&a=", from.m, "&b=",  :
>   cannot open: HTTP status was '0 (null)'
>> getSymbols(MySymbols[[4]], from = Date_Start, adjust = MyAdjust, src =
>> SymbolSrc)
> [1] "GLD" "TLT"
>> getSymbols(MySymbols[[5]], from = Date_Start, adjust = MyAdjust, src =
>> SymbolSrc)
> [1] "SHY"
>
>    Anyway, to implement my reading of this paper, I really need adjusted data to take dividends into account and it seems silly to me to keep downloading the same data over and over from Yahoo as I play with this R code. To that end I'm wondering if anyone has a code example for saving Yahoo data once to MySQL and then (presumably) changing what I call SymbolSrc=MySQL for subsequent runs when I work on the R code? I've been searching around in StackOverflow as well as Google and haven't yet found anything very executable.
>
>    Please note that I'm a retired investor, not a trader, and seldom work with data faster than 30 minute bars and even then almost never real-time data, at least in terms of what I would store in MySQL so I'm not overly concerned right now as to whether MySQL is fast or slow. I already run MySQL for a couple of other reasons on my Gentoo box where I'm working in R and would prefer not run another database right now unless _necessary_.
>
>    Clearly, if there's some stupid issue in my code that fixes this that would be great to know about also! :-)
>
> Thanks in advance,
> Mark
>
>
>
> library(quantmod)
>
> Date_Start = "2000-01-01"
> Date_End = format(Sys.time(), "%Y-%m-%d") Sys.setenv(TZ = "UTC") SymbolSrc = "yahoo"
> MyAdjust = TRUE
>
> MySymbols = list()
> MySymbols[[1]]= c("VTI","VEU")
> MySymbols[[2]]= c("HYG","CIU")
> MySymbols[[3]]= c("VNQ","REM")
> MySymbols[[4]]= c("GLD","TLT")
> MySymbols[[5]]= c("SHY")
>
> getSymbols(MySymbols[[1]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[2]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[3]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[4]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc) getSymbols(MySymbols[[5]], from = Date_Start, to = Date_End, adjust = MyAdjust, src=SymbolSrc)
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



More information about the R-SIG-Finance mailing list