[R] How to download this data?

Duncan Temple Lang dtemplelang at ucdavis.edu
Sat Aug 3 00:48:38 CEST 2013


That URL is an HTTPS (secure HTTP), not an HTTP.
The XML parser cannot retrieve the file.
Instead, use the RCurl package to get the file.

However, it is more complicated than that. If
you look at source of the HTML page in a browser,
you'll see a jsessionid and that is a session identifier.

The following retrieves the content of your URL and then
parses it and extracts the value of the jsessionid.
Then we create the full URL to the actual data page (which is actually in the HTML
content but in JavaScript code)

library(RCurl)
library(XML)

rawOrig = getURLContent("https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry")
rawDoc = htmlParse(rawOrig)
tmp = getNodeSet(rawDoc, "//@href[contains(.,\040'jsessionid=')]")[[1]]
jsession = gsub(".*jsessionid=([^?]+)?.*", "\\1", tmp)

u = sprintf("https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=&specId=219", jsession)

doc = htmlParse(getURLContent(u))
tbls = readHTMLTable(doc)
data = tbls[[1]]

dim(data)


I did this quickly so it may not be the best way or completely robust, but hopefully
it gets the point across and does get the data.

  D.

On 8/2/13 2:42 PM, Ron Michael wrote:
> Hi all,
>  
> I need to download the data from this web page:
>  
> https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry
>  
> I used the function readHTMLTable() from package XML, however could not download that.
>  
> Can somebody help me how to get the data onto my R window?
>  
> Thank you.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list