[R] problems reading XML type file from ishares website

Bos, Roger roger.bos at rothschild.com
Fri Jul 29 15:51:54 CEST 2016


Jeff,

Thanks so much for your help.  I feel pretty confident in saying that there is no way I could have figured out how to open that file (in R) by myself.  It was hard enough to get the data I needed once I could read the file.  In case anyone on the list is interested, here is a working solution to download the S&P 500 weights, though I am sure there is a better way than mine:

  library(XML)
  temp <- "https://www.ishares.com/us/239726/fund-download.dl"
  fname <- "C:/pit/" %+% etf %+% "_FundHoldings.xls"
  download.file(url = temp, destfile = fname)
  txt <- readLines(fname, encoding="UTF-8-BOM" )
  txt <- sub( "</Style>", "</ss:Style>", txt )
  fnamenobom  <- "nobom.xml"
  cat( paste( txt, collapse="\n" ), file=fnamenobom )
  xmlfile  <- xmlTreeParse(fnamenobom)
  xmltop = xmlRoot(xmlfile)
  xml_data <- xmlToList(xmlfile)

  datadate <- unlist(xml_data[["Worksheet"]][[1]])[1]
  d <- list()
  for (ii in 10:length(xml_data[["Worksheet"]])) {
    rd <- unlist(xml_data[["Worksheet"]][[ii]])
    d[[ii]] <- data.frame(Symbol=rd[1], weight=as.numeric(rd[10]), ISIN=rd[31])
  }
  out <- rbindlist(d)
  out$datadate <- datadate
  out

Thanks,

Roger





***************************************************************
This message and any attachments are for the intended recipient's use only.
This message may contain confidential, proprietary or legally privileged
information. No right to confidential or privileged treatment
of this message is waived or lost by an error in transmission.
If you have received this message in error, please immediately
notify the sender by e-mail, delete the message, any attachments and all
copies from your system and destroy any hard copies.  You must
not, directly or indirectly, use, disclose, distribute,
print or copy any part of this message or any attachments if you are not
the intended recipient.


-----Original Message-----
From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us]
Sent: Thursday, July 28, 2016 2:55 PM
To: Bos, Roger; R-help
Subject: Re: [R] problems reading XML type file from ishares website

Er, I failed to include the step to write the repaired data to a file...

fnamenobom  <- "nobom.xml"
cat( paste( txt, collapse="\n" ), file=fnamenobom ) xmlfile  <- xmlTreeParse( fnamenobom )

--
Sent from my phone. Please excuse my brevity.

On July 28, 2016 11:20:23 AM PDT, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
>Please keep the list included in the thread (e.g. reply-all?).
>
>I looked at the file and agree that it looks like xml with a utf8 byte
>order mark and Unix line endings, which means it is not XLS and it is
>not XLSX (which is a zipped directory of xml files with DOS line
>endings). Excel complains but manages to open the file if it has the
>XLS extension,  but I am not aware that any of the usual R Excel
>packages will understand this file.
>
>The byte order mark can be addressed by opening the file with
>encoding="UTF-8-BOM", but as you mentioned originally the XML structure
>is still broken (c.f. the error message about the Style ending tag).
>Line 16 seems to use /Style rather than /ss:Style. Maybe
>
>library(XML)
>txt <- readLines( fname, encoding="UTF-8-BOM" ) txt <- sub( "</Style>",
>"</ss:Style>", txt ) fnamenobom  <- "nobom.xml"
>xmlfile  <- xmlTreeParse( "nobom.xml" )



More information about the R-help mailing list