[Rd] Yahoo bug in tseries::get.hist.quote and its::priceIts

Sun Apr 25 01:16:15 CEST 2004

Dirk Eddelbuettel <edd at debian.org> writes:

> Both get.hist.quote, and its derivative priceIts, rely on download.file() to
> fetch financial data series from Yahoo! in .csv format. They allow for nice
> interactive demonstrations of what one can do with R.

Er, how does this affect get.hist.quote? I see some flakiness, but the
basic conversion appears to work:

>      spc <- get.hist.quote(instrument = "spc", start = "1998-01-01")
trying URL
`http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv'
Error in download.file(url, destfile, method = method) :
        cannot open URL
`http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv'
In addition: Warning message:
cannot open: HTTP status was `404 Not Found'
>      spc <- get.hist.quote(instrument = "spc", start = "1998-01-01")
trying URL
`http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv'
Content type `application/octet-stream' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... ..
downloaded 72Kb

time series starts 1998-01-02
time series ends   2004-04-01

(Yes, that's the same URL, a few seconds later!) 

> Unfortunately, both are currently broken as Yahoo! decided to add a somewhat
> useless html comment at the end of the csv 'stream', breaking the regular
> format of n rows with k columns.  Here is an example for the S&P500 index
> since the beginning of the month (to keep it compact):
> 
> Date,Open,High,Low,Close,Volume,Adj. Close*
> 23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60
> 22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93
> 21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09
> 20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15
> 19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82
> 16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61
> 15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84
> 14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17
> 13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44
> 12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20
> 9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32
> 8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32
> 7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53
> 6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16
> 5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57
> 2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81
> 1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17
> <!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 -->
> 
> Is there an _elegant and portable_ way of reading this with the last line?
> I needed this, and used the somewhat clunky 
> 
>     data <- read.csv(destfile)
>     unlink(destfile)
>     data <- data[-(nlines-1),]          # skip very last line with commment
> 
> which uses nlines, which had already been computed (as has a offset of one
> because of the header line).

How about this?

> v <- readLines(url("http://chart.yahoo.com/table.csv?s=ibm&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=ibm&x=.csv"))
> x <- read.csv(textConnection(v[-grep("^<!",v)]))
> str(x)
`data.frame':   1586 obs. of  7 variables:
 $ Date       : Factor w/ 1586 levels "1-Apr-02","1-Ap..",..: 786 732 681 629 524 368 315 263 210 157 ...
 $ Open       : num  91.0 90.5 91.2 92.0 91.9 ...
 $ High       : num  91.6 91.5 91.4 92.5 92.3 ...
 $ Low        : num  90.4 89.7 90.7 90.7 91.7 ...
 $ Close      : num  91.3 90.7 91.3 90.7 91.9 ...
 $ Volume     : int  5063200 7988000 4623400 4260200 4159400 1111800 6844200 5316300 5013600 3112600 ...
 $ Adj..Close.: num  91.3 90.7 91.3 90.7 91.9 ...

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907