[R-SIG-Finance] yahoo dates

G See gsee000 at gmail.com
Thu Jul 12 16:06:35 CEST 2012


The workaround for get.hist.quote sounds like a good one, and I would
love to see something similar added to getSymbols.yahoo.

I did not mean to suggest that there's not a problem; the point I was
trying to get at is that I think it only occurs with historical data,
not "current" data.  So, the user could record their own data.  I
realize that's a little off topic to the question though.

To test my theory, I've been "streaming" quotes from yahoo for the
past couple days by repeatedly requesting current quotes.  I've found
that the extra volume never appears using this method.  When the
market closes, the Volume stops changing.  When the market opens, the
volume jumps to zero (or close to it).  So, one workaround might be to
replace the last day (and it's duplicate if there is a duplicate) with
the data returned by `getQuote`

For reference, here's the code I used to collect data.  Although, it
probably shouldn't be used because Yahoo probably doesn't like folks
hitting their server this hard.

library(quantmod)
filename <- "~/GSPCintra.csv"
file.create(filename)
# Add headers
cat(paste0("Sys.time,", paste(make.names(colnames(getQuote("^GSPC"))),
                              collapse=","), "\n"), file=filename)

# record data; break with ctrl-c
while(TRUE) {
  try(cat(paste0(Sys.time(), ",", paste(getQuote("^GSPC")[1, ], collapse=","),
                                    "\n"), file=filename, append=TRUE))
}

# retrieve
tmp <- read.table(filename, stringsAsFactors=FALSE, sep=",", header=TRUE)
x <- xts(tmp[, c(6:8, 3, 9)], as.POSIXct(tmp[, 1]))

Cheers,
Garrett

On Tue, Jul 10, 2012 at 4:56 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:
> On Mon, 9 Jul 2012, G See wrote:
>
>> FWIW, I download 33 fields from yahoo every night at 10 p.m. CDT using
>
>
> I'm not sure but maybe that is still too early. The problem is real and
> occurs for me "now" (around 10:00 GMT), both for individual stocks and
> indexes:
>
> R> library("quantmod")
> R> getSymbols(c("^GSPC", "IBM"))
> R> tail(GSPC, 3)
>            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
> 2012-07-06   1367.09   1367.09  1348.03    1354.68  2745140000       1354.68
> 2012-07-09   1354.66   1354.87  1346.65    1352.51   399252300       1352.46
> 2012-07-09   1354.66   1354.87  1346.65    1352.46  2904860000       1352.46
> R> tail(IBM, 3)
>            IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted
> 2012-07-06   193.92   193.94  189.74    191.41    4952900       191.41
> 2012-07-09   190.66   191.00  188.05    189.65    3569800       189.67
> 2012-07-09   190.76   191.00  188.05    189.67    3988100       189.67
>
> Note the last line in both cases, especially the volume. The same is
> visible, of course, at the Yahoo! Finance site:
>
> http://finance.yahoo.com/q/hp?s=^GSPC+Historical+Prices
> http://finance.yahoo.com/q/hp?s=IBM+Historical+Prices
>
> Users of Yahoo! Finance also complained about this in the user forum. But as
> nobody could offer a good explanation for this, we implemented a patch in
> tseries' get.hist.quote() that omits the last observation in case its date
> is dupblicated:
>

This sounds like something quantmod should consider doing

> R> library("tseries")
> R> tail(get.hist.quote("IBM"), 3)
>              Open   High    Low  Close
> 2012-07-05 194.88 196.85 193.63 195.29
> 2012-07-06 193.92 193.94 189.74 191.41
> 2012-07-09 190.76 191.00 188.05 189.67
> Warning message:
> In get.hist.quote("IBM") : first date duplicated, first instance omitted
>
> Best,
>
> Z
>



More information about the R-SIG-Finance mailing list