[R] Reading in an XLS (really XML) file from website
John McKown
john.archie.mckown at gmail.com
Fri Feb 27 23:05:49 CET 2015
On Fri, Feb 27, 2015 at 10:01 AM, Bos, Roger <roger.bos at rothschild.com>
wrote:
> All,
>
> I am trying to read the S&P 500 constituents from the iShares website
> using the following code:
>
> URL <- "http://www.ishares.com/us/239726/fund-download.dl"
> setInternet2(TRUE)
> download.file(url=URL, destfile="temp.xls")
> out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings",
> header=TRUE, startRow=13)
>
> R returns the following error:
>
> > out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings",
> header=TRUE, startRow=13)
> Error: IllegalArgumentException (Java): Your InputStream was neither an
> OLE2 stream, nor an OOXML stream
> In addition: Warning message:
> In download.file(url = URL, destfile = "temp.xls") :
> downloaded length 1938303 != reported length 200
>
> Upon further examination this is because the format is really XML. Is
> there any way to get XLConnect or any other excel reader to read in an XML
> file? I thought XML was for new Excel format.
>
> Barring that, can we read in the file using the XML package? I tried the
> following code...
>
> require(XML)
> tmp <- xmlParse(URL)
>
> ... but I get this error:
>
> Opening and ending tag mismatch: Style line 14 and Style
> Error: 1: Opening and ending tag mismatch: Style line 14 and Style
>
> Thanks in advance for any help or hints,
>
> Roger
>
>
The problem is indeed on line 14 of the file. The contents of that line
are:
</style>
but should be
</ss:style>
That is, the file is malformed. I edited the file to make that change and
saved it. After I did this, I was able to open it as a spreadsheet using
LibreOffice. I did all of this on my home Linux system. I don't have
Windows, and thus no Excel either, available here, so I can't test with
Excel. You should be able to download this file as shown by Raghuraman. On
Windows (which I _assume_ you are using since most do), you can edit the
file using Notepad, or Wordpad. I would use Wordpad myself. Notepad is
"iffy" on some things. Save it back, then try readWorksheetFromFile() as
you originally did.
--
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
[[alternative HTML version deleted]]
More information about the R-help
mailing list